Dissecting BFT Consensus: In Trusted Components we Trust!

14
Dissecting BFT Consensus: In Trusted Components we Trust! Suyash Gupta RISELab University of California, Berkeley Sajjad Rahnama Exploratory Systems Lab University of California, Davis Shubham Pandey Exploratory Systems Lab University of California, Davis Natacha Crooks RISELab University of California, Berkeley Mohammad Sadoghi Exploratory Systems Lab University of California, Davis ABSTRACT The growing interest in reliable multi-party applications has fos- tered widespread adoption of Byzantine Fault-Tolerant (bft) con- sensus protocols. Existing bft protocols need f more replicas than Paxos-style protocols to prevent equivocation attacks. trust-bft protocols instead seek to minimize this cost by making use of trusted components at replicas. This paper makes two contributions. First, we analyze the design of existing trust-bft protocols and uncover three fundamental limitations that preclude most practical deployments. Some of these limitations are fundamental, while others are linked to the state of trusted components today. Second, we introduce a novel suite of consensus protocols, FlexiTrust, that attempts to sidestep these issues. We show that our FlexiTrust protocols achieve up to 185% more throughput than their trust-bft counterparts. 1 INTRODUCTION Byzantine Fault-Tolerant (bft) protocols allow multiple parties to perform shared computations and reliably store data without fully trusting each other; the system will remain correct even if a subset of participants behave maliciously [8, 24, 26, 30, 56]. These protocols aim to ensure that all the replicas reach a consensus on the order of incoming client requests. The popularity of these bft protocols is evident from the fact that they are widely used to- day, with applications ranging from safeguarding replicated and sharded databases [27, 50], edge applications [40], and blockchain applications [2, 39, 54]. bft protocols tolerate a subset of participants behaving arbitrar- ily: a malicious actor can delay, reorder or drop messages (omis- sion faults); it can also send conflicting information to participants (equivocation). As a result, bft consensus protocols are costly: main- taining correctness in the presence of such attacks requires a mini- mum of 3f + 1 participants, where f participants can be malicious. This is in contrast to crash-fault tolerant protocols (cft), where participants can fail by crashing, which require 2f + 1 participants only for correctness [31, 42]. To minimise this additional cost, some existing bft protocols leverage trusted components to curb the degree to which partici- pants can behave arbitrarily [16, 17, 32]. A trusted component prov- ably performs a specific computation: incrementing a counter [33, 52], appending to a log [11], or more advanced options like execut- ing a complex algorithm [18]. While there exists a large number of bft protocols that leverage these trusted components [11, 33, 55] (we refer to these protocol as trust-bft protocols for simplicity), they all proceed in a similar fashion: they force each replica to commit to a single order for each request by having the trusted component sign each message it sends. In turn, each trusted com- ponent either: (1) records the chosen order for a client request in an append-only log, or (2) binds this order for the request with the value of a monotonically increasing counter. Committing to an order in this way allows these protocols to remain safe with 2f + 1 replicas only, bringing them in line with their cft counterparts. While reducing replication cost is a significant benefit, this paper argues that current trust-bft protocols place too much trust in trusted components. Our analysis uncovers three fundamental issues with existing trust-bft implementations that preclude most prac- tical deployments: (i) limited liveness for clients, (ii) safety concerns associated with trusted components, and (iii) inability to perform multiple consensuses in parallel. Loss of responsiveness with a single message delay. We observe that a single message delay is enough for byzantine replicas to prevent a client from receiving a response for its transactions. While the transaction may appear committed (replica liveness), the system will appear to clients as stalled and thus appear non-responsive to clients. This issue exists because trust-bft protocols allow a reduced quorum size of f + 1 replicas to commit a request. Loss of Safety under Rollback Attacks. Existing trust-bft pro- tocols consider an idealised model of trusted computations. They assume that the trusted components cannot be compromised and that their data remains persistent in the presence of a malicious host. This assumptions has two caveat: (1) a large number of these pro- tocols employ Intel SGX enclaves for trusted computing [6, 18, 48]. Unfortunately, SGX-based designs have been shown to suffer from rollback attacks [28, 38, 53], and the solutions to mitigate these attacks lack practical deployments [20]. (2) Alternatively, develop- ers can make use of persistent counters [16] and TPMs [22], but these designs have prohibitively high latencies (tens of millisec- onds) [33, 43]. Sequential Consensus. Existing trust-bft protocols are inher- ently sequential as they require each outgoing message to be or- dered and attested by trusted components. While recent work miti- gates this issue by pipelining consensus phases [18] or running mul- tiple independent consensus invocations [6], their performance re- mains fundamentally limited. In fact, despite their lower replication factor, we observe that trust-bft protocols achieve lower through- put than traditional out-of-order bft protocols [8, 26, 41, 50]. This paper argues that trust-bft protocols have targeted the wrong metric: 2f + 1 replicas are not enough, even when leveraging trusted hardware. Trusted components, however, can still bring huge benefits to bft consensus when they use 3f + 1 replicas. We propose a novel suite of consensus algorithms (Flexible Trusted arXiv:2202.01354v2 [cs.DB] 20 May 2022

Transcript of Dissecting BFT Consensus: In Trusted Components we Trust!

Dissecting BFT Consensus: In Trusted Components we Trust!Suyash Gupta

RISELabUniversity of California, Berkeley

Sajjad RahnamaExploratory Systems Lab

University of California, Davis

Shubham PandeyExploratory Systems Lab

University of California, Davis

Natacha CrooksRISELab

University of California, Berkeley

Mohammad SadoghiExploratory Systems Lab

University of California, Davis

ABSTRACT

The growing interest in reliable multi-party applications has fos-tered widespread adoption of Byzantine Fault-Tolerant (bft) con-sensus protocols. Existing bft protocols need f more replicas thanPaxos-style protocols to prevent equivocation attacks. trust-bftprotocols instead seek tominimize this cost bymaking use of trustedcomponents at replicas.

This paper makes two contributions. First, we analyze the designof existing trust-bft protocols and uncover three fundamentallimitations that preclude most practical deployments. Some of theselimitations are fundamental, while others are linked to the state oftrusted components today. Second, we introduce a novel suite ofconsensus protocols, FlexiTrust, that attempts to sidestep theseissues. We show that our FlexiTrust protocols achieve up to 185%more throughput than their trust-bft counterparts.

1 INTRODUCTION

Byzantine Fault-Tolerant (bft) protocols allow multiple partiesto perform shared computations and reliably store data withoutfully trusting each other; the system will remain correct even if asubset of participants behave maliciously [8, 24, 26, 30, 56]. Theseprotocols aim to ensure that all the replicas reach a consensus onthe order of incoming client requests. The popularity of these bftprotocols is evident from the fact that they are widely used to-day, with applications ranging from safeguarding replicated andsharded databases [27, 50], edge applications [40], and blockchainapplications [2, 39, 54].

bft protocols tolerate a subset of participants behaving arbitrar-ily: a malicious actor can delay, reorder or drop messages (omis-

sion faults); it can also send conflicting information to participants(equivocation). As a result, bft consensus protocols are costly: main-taining correctness in the presence of such attacks requires a mini-mum of 3f + 1 participants, where f participants can be malicious.This is in contrast to crash-fault tolerant protocols (cft), whereparticipants can fail by crashing, which require 2f + 1 participantsonly for correctness [31, 42].

To minimise this additional cost, some existing bft protocolsleverage trusted components to curb the degree to which partici-pants can behave arbitrarily [16, 17, 32]. A trusted component prov-ably performs a specific computation: incrementing a counter [33,52], appending to a log [11], or more advanced options like execut-ing a complex algorithm [18]. While there exists a large number ofbft protocols that leverage these trusted components [11, 33, 55](we refer to these protocol as trust-bft protocols for simplicity),they all proceed in a similar fashion: they force each replica to

commit to a single order for each request by having the trustedcomponent sign each message it sends. In turn, each trusted com-ponent either: (1) records the chosen order for a client request inan append-only log, or (2) binds this order for the request withthe value of a monotonically increasing counter. Committing to anorder in this way allows these protocols to remain safe with 2f + 1replicas only, bringing them in line with their cft counterparts.

While reducing replication cost is a significant benefit, this paperargues that current trust-bft protocols place too much trust in

trusted components. Our analysis uncovers three fundamental issueswith existing trust-bft implementations that preclude most prac-tical deployments: (i) limited liveness for clients, (ii) safety concernsassociated with trusted components, and (iii) inability to performmultiple consensuses in parallel.

Loss of responsiveness with a single message delay.We observe thata single message delay is enough for byzantine replicas to preventa client from receiving a response for its transactions. While thetransaction may appear committed (replica liveness), the systemwill appear to clients as stalled and thus appear non-responsiveto clients. This issue exists because trust-bft protocols allow areduced quorum size of f + 1 replicas to commit a request.

Loss of Safety under Rollback Attacks. Existing trust-bft pro-tocols consider an idealised model of trusted computations. Theyassume that the trusted components cannot be compromised andthat their data remains persistent in the presence of a malicious host.This assumptions has two caveat: (1) a large number of these pro-tocols employ Intel SGX enclaves for trusted computing [6, 18, 48].Unfortunately, SGX-based designs have been shown to suffer fromrollback attacks [28, 38, 53], and the solutions to mitigate theseattacks lack practical deployments [20]. (2) Alternatively, develop-ers can make use of persistent counters [16] and TPMs [22], butthese designs have prohibitively high latencies (tens of millisec-onds) [33, 43].

Sequential Consensus. Existing trust-bft protocols are inher-ently sequential as they require each outgoing message to be or-dered and attested by trusted components. While recent work miti-gates this issue by pipelining consensus phases [18] or running mul-tiple independent consensus invocations [6], their performance re-mains fundamentally limited. In fact, despite their lower replicationfactor, we observe that trust-bft protocols achieve lower through-put than traditional out-of-order bft protocols [8, 26, 41, 50].

This paper argues that trust-bft protocols have targeted thewrong metric: 2f + 1 replicas are not enough, even when leveragingtrusted hardware. Trusted components, however, can still bringhuge benefits to bft consensus when they use 3f + 1 replicas. Wepropose a novel suite of consensus algorithms (Flexible Trusted

arX

iv:2

202.

0135

4v2

[cs

.DB

] 2

0 M

ay 2

022

Suyash Gupta, Sajjad Rahnama, Shubham Pandey, Natacha Crooks, and Mohammad Sadoghi

bft (FlexiTrust)), which address the aforementioned limitations.These protocols are always live and achieve high throughput as they(1) make minimal use of trusted components (once per consensusand at the primary replica only), and (2) support parallel consensusinvocations. Both these properties are made possible by the useof larger quorums of size 2f + 1, enabled by the increased replica-tion factor. Our techniques can be used to convert any trust-bftprotocol into a FlexiTrust protocol. We provide as examples twosuch conversions: Flexi-BFT and Flexi-ZZ, two protocols based onPbft [8]/MinBFT [52] and Zyzzyva [30]/MinZZ [52], respectively.Flexi-BFT follows a similar structure to Pbft, but requires one lesscommunication phase. Flexi-ZZ is, we believe, of independent inter-est: the protocol achieves consensus in a single linear phase withoutusing expensive cryptographic constructs such as threshold signa-tures. Crucially, unlike the Zyzzyva andMinZZ protocols, Flexi-ZZcontinues to commit in a single-round even when a single partic-ipant misbehaves, thus maintaining high-throughput [13, 14, 26].Further, Flexi-ZZ does not face the safety bug like Zyzzyva [1].

We evaluate our FlexiTrust variants against five protocols: threetrust-bft and two bft systems. For fair evaluation, we implementthese protocols on the open-source ResilientDB fabric [25, 26, 41].Our evaluation on a real-world setup of 97 replicas and up to 80 kclients shows that our FlexiTrust protocols achieve up to 185%and 100% more throughput than their trust-bft and bft counter-parts, respectively. Further, we show that our FlexiTrust protocolscontinue outperforming these protocols even when replicas aredistributed across six locations in four continents. In summary, wemake the following four contributions:

• We identify responsiveness issues in existing trust-bftprotocols, even under a single message delay.

• We highlight that rollback attacks on trusted componentslimit practical deployments.

• We observe that existing trust-bft protocols are inherentlysequential. The lack of support for parallel consensus invo-cations artificially limits throughput compared to traditionalbft prtoocols.

• We present FlexiTrust protocols: a novel suite of protocolsthat support concurrent consensus invocations and requireminimal access to the trusted components.

2 SYSTEM MODEL AND NOTATIONS

We adopt the standard communication and failure model adopted bymost bft protocols [8, 23, 26, 50], including all existing trust-bftprotocols [6, 11, 18, 52, 55].

We consider a replicated service S consisting of set of 𝑛 replicasR of which at most 𝑓 can behave arbitrarily. The remaining 𝑛 −𝑓 are honest: they will follow the protocol and remain live. Weadditionally assume the existence of a finite set of clients C ofwhich arbitrarily many can be malicious.

Similarly, we inherit standard communication assumptions: weassume the existence of authenticated channels (Byzantine repli-cas can impersonate each other but no replica can impersonatean honest replica) and standard cryptographic primitives such asMACs and digital signatures (DS). We denote a message𝑚 signedby a replica r using DS as ⟨𝑚⟩r. We employ a collision-resistant

hash function Hash(·) to map an arbitrary value 𝑣 to a constant-sized digest Hash(𝑣). We adopt the same partial synchrony modeladopted in most bft systems: safety is guaranteed in an asyn-chronous environment where messages can get lost, delayed, orduplicated. Liveness, however, is only guaranteed during periodsof synchrony [8, 11, 26, 50, 56].

Each replica only accepts a message if it is well-formed. A well-formed message has a valid signature and passes all the protocolchecks. Any consensus protocol is expected to uphold the followingguarantees:Safety. If two honest replicas r1 and r2 order a transaction 𝑇 at

sequence numbers 𝑘 and 𝑘 ′, then 𝑘 = 𝑘 ′.Liveness. If a client sends a transaction 𝑇 , then it will eventually

receive a response for 𝑇 .We additionally assume that our replicated service S includes a

set I of trusted components. These trusted components offer thefollowing abstraction:

Definition 1. A trusted component t ∈ I is a cryptographically

secure entity, which has a negligible probability of being compromised

by byzantine adversaries. t provides access to a function foo() and,

when called, always computes foo(). Any replica r cannot coerce t

to compute anything but foo().

Existing trust-bft protocols assume that each replica r ∈ Rhas access to a co-located trusted component. We use the notationtr to denote the trusted component at the “host” replica r. As trcomputes foo() and tr cannot be compromised by the host r,existing trust-bft protocols claim integrity in the computation offoo().

3 PRIMER ON BFT CONSENSUS

To highlight the limitations of trust-bft protocols, we first providethe necessary background on bft consensus. To this effect, we sum-marize the structure of Pbft-like systems, which represent mostmodern bft protocols today [21, 23, 26, 30, 41, 56]. For simplicityof exposition, we focus specifically on Pbft [8].

Pbft adopts the system and communication model previouslydescribed and guarantees safety and liveness for n = 3f + 1 replicaswhere at most f can be Byzantine. The protocol follows a primary-backup model: one replica is designated as the primary (or leader),while the remaining replicas act as backups. At a high-level, allbft protocols consist of two logical phases: an agreement phasewhere replicas agree to commit a specific operation, and a seconddurability phase where this decision will persist even when theleader is malicious and replaced as part of a view-change (view-changes are the process through which leaders are replaced). Forease of understanding, we focus on explaining the simplest failure-free run of Pbft in Figure 1.

(1) Client Request. A client 𝑐 submits a new transaction 𝑇 bysending a signed message ⟨𝑇 ⟩𝑐 to the primary replica p.

(2) Pre-prepare. When the primary replica p receives a well-formed client request𝑚, it assigns the corresponding transactiona new sequence number 𝑘 and sends a Preprepare message to allbackups.

(3) Prepare. When a replica r ∈ R receives, for the first time,a well-formed Preprepare message from p for a given sequence

Dissecting BFT Consensus: In Trusted Components we Trust!

r3r2r1p𝑐

𝑇

Preprepare Prepare Commit Reply

Figure 1: Schematic representation of thePbftprotocol. The

byzantine replica (in red) may decide to not participate in

the consensus protocol.

number 𝑘 , r broadcasts to all other replicas a Prepare messageagreeing to support this (sequence number, transaction) pairing.

(4) Commit. When a replica r receives identical Prepare mes-sages from 2f + 1 replicas, it marks the request𝑚 as prepared andbroadcasts a Commit message to all replicas. When a request isprepared, a replica has the guarantee that no conflicting request𝑚′

for the same sequence number will ever be prepared by this leader.(5) Execute. Upon receiving 2f + 1matching Commitmessages,

r marks the request as committed and executable. Each replicaexecutes every request in sequence number order: the node waitsuntil request for slot𝑘−1 has successfully executed before executingthe transaction at slot 𝑘 . Finally, the replica returns the result of theoperation to the client 𝑐 . The client 𝑐 waits for identical responsesfrom f + 1 other participants and marks the request as complete.Waiting for f + 1matching responses ensures execution correctnessas at least one honest replica can vouch for that result.

4 TRUSTED BFT CONSENSUS

bft protocols successfully implement consensus, but at a cost: theyrequire a higher replication factor (3f +1) and, in turn, must processsignificantly more messages (all of which must be authenticatedand whose signatures/MACs must be verified). Equivocation is theprimary culprit: a byzantine replica can tell one group of honestreplicas that it plans to order a transaction 𝑇 before transaction 𝑇 ′,and tells another group of replicas that it will order 𝑇 ′ before 𝑇 .

In a cft system, it is sufficient for quorums to intersect in a singlereplica to achieve agreement. In bft systems, as byzantine replicascan equivocate, quorums must instead intersect in one honest replica(or, in other words, must intersect in at least f + 1 replicas).

To address this increased cost, trust-bft protocols make use oftrusted components (such as Intel SGX, AWSNitro) at each node [11,12]. Trusted components cannot, by assumption, be compromisedby malicious actors. They can thus be used to prevent replicas fromequivocating. In theory, this should be sufficient to safely convertany bft protocol into a cft system [12]. In practice, however, aswe highlight in this work, this process is less than straightforward.

In the remainder of this section, we first provide the necessarybackground on the API and guarantees offered by trusted compo-nents before describing how they are used in trust-bft protocols.

4.1 Trusted Component Implementations

One approach suggests running a full application [3, 44, 47, 51]inside the trusted components like Intel SGX to shield them frombyzantine attacks. However, this approach violates the principle ofleast privilege as the attacker needs to find just one vulnerability to

compromise the entire system [35, 46]. Further, trusted componentshave small memories, which makes it difficult to encapsulate theentire system in practice.

Instead, existing protocols choose to put the smallest amountof computation inside the trusted component, which has the ben-efit of allowing custom hardware implementations that are moresecure [11, 33]. There are two primary approaches: append-onlylogs, and monotonically increasing counters. Careful use of theseprimitives allows prior work to reduce the replication factor from3f + 1 to 2f + 1.

Trusted Logs. trust-bft protocols like Pbft-EA [11] andHotStuff-M [55] maintain, in each trusted component tr, a set of append-onlylogs. Each log 𝑞 has a set of slots. We refer to each slot using anidentifier 𝑘 . Each trusted component tr offers the following API.

(1) Append(𝑞, 𝑘𝑛𝑒𝑤 , 𝑥) – Assume the last append to log 𝑞 oftrusted component tr was at slot 𝑘 , then

• If 𝑘𝑛𝑒𝑤 = ⊥, no slot location is specified, tr sets 𝑘 to𝑘 + 1 and appends the value 𝑥 to slot 𝑘 .

• If 𝑘𝑛𝑒𝑤 > 𝑘 , tr appends 𝑥 to slot 𝑘𝑛𝑒𝑤 and updates 𝑘to 𝑘𝑛𝑒𝑤 . The slots in between can no longer be used.

(2) Lookup(𝑞, 𝑘) – If there exists a value 𝑥 at slot 𝑘 in log 𝑞 oftr, it returns an attestation ⟨Attest(𝑞, 𝑘, 𝑥)⟩tr .

The Append function ensures that no two requests are ever loggedat the same slot; the Lookup function confirms this fact with adigitally-signed assertion ⟨Attest(𝑞, 𝑘, 𝑥)⟩tr where 𝑘 is the logposition and 𝑥 is the stored message.

Trusted Counters. A separate line of work further restricts thescope of the trusted components. Rather than storing an attestedlog of messages, it simply stores a set of monotonically increasingcounters [29, 52]. These counters do not provide a Lookup func-tion as they do not store messages. Instead, the Append(𝑞, 𝑘𝑛𝑒𝑤 , 𝑥)function, at the time of invocation, returns the attestation proof⟨Attest(𝑞, 𝑘, 𝑥)⟩tr , which states that the 𝑞-th counter updates itscurrent value 𝑘 to 𝑘𝑛𝑒𝑤 and bounds the updated value 𝑘 to message𝑥 .

These two designs are not mutually exclusive; protocols likeTrinc [33], Hybster [6] and Damysus [18] require their trustedcomponents to support both logs and counters, where logs recordthe last few client requests.

4.2 trust-bft protocol

Existing trust-bft protocols expect that in a system of n = 2f + 1replicas at most f replicas are byzantine. Most trust-bft [6, 11,29, 33, 52, 55] follow a similar design, based on the Pbft-EA [11]protocol. We focus on describing this protocol next. Pbft-EA isderived from Pbft and makes use of a set of trusted logs. Figure 2illustrates each step.

Pbft-EA protocol steps. In the Pbft-EA protocol, tr storesfive distinct sets of logs, corresponding to each phase of the con-sensus. When the primary p receives a client request 𝑚, it callsthe Append(𝑞, 𝑘,𝑚) function to assign 𝑚 a sequence number 𝑘 .tp logs 𝑚 in the 𝑞-th preprepare log and returns an attestation⟨Attest(𝑞, 𝑘,𝑚)⟩tp , which p forwards to all replicas along with thePreprepare message. The existence of a trusted log precludes aprimary from equivocating and sending two conflicting messages𝑚 and𝑚′ for the same sequence number 𝑘 .

Suyash Gupta, Sajjad Rahnama, Shubham Pandey, Natacha Crooks, and Mohammad Sadoghi

tr2r2tr1r1tpp𝑐

𝑇

Preprepare Prepare Commit Reply

Figure 2: Schematic representation of the Pbft-EA protocol

where each replica has access to a trusted component.

When a replica r receives a well-formed Preprepare messagefor slot 𝑘 , it creates a Prepare message𝑚′, and calls the Appendfunction on its tr to log𝑚′. As a result, tr logs𝑚′ in its 𝑞-th prepre-pare log and returns an attestation ⟨Attest(𝑞, 𝑘,𝑚′)⟩tr , which rforwards along with the Prepare message to all replicas.

Once r receives f + 1 identical Prepare messages from distinctreplicas (including itself), it declares the transaction prepared. Fol-lowing this, r repeats the process by creating a Commit message(say𝑚′′) and asking its tr to log this message at slot 𝑘 in its𝑞-th pre-pare log. r broadcasts the signed attestation ⟨Attest(𝑞, 𝑘,𝑚′′)⟩tralong with the Commit message. Once r receives f + 1 matchingcommitmessages for𝑚 at slot 𝑘 , it marks the operation committed.A client marks a request complete when it receives f + 1 identicalresponses.

Checkpoints. Like bft protocols, trust-bft protocols also pe-riodically share checkpoints. These checkpoints reflect the stateof a replica r’s trusted component tr and enable log truncation.During the checkpoint phase, r sends Checkpoint messages thatinclude all the requests committed since the last checkpoint. If tremploys trusted logs, then it also provides attestations for eachlogged request. In the case tr employs only trusted counters, trprovides an attestation on the current value of its counters. Eachreplica r marks its checkpoint as stable if it receives Checkpointmessages from f + 1 other replicas possibly including itself.

View Change. The existence of trusted logs/counters precludesthe primary from equivocating, but it can still deny service andmaliciously delay messages. The view-change mechanism enablesreplicas to replace a faulty primary; a view refers to the durationfor which a specific replica was leader. All trust-bft protocolsprovide such a mechanism [11, 29]: a view change is triggered whenf + 1 replicas send a Viewchange message. Requiring at least f + 1messages ensures that malicious replicas cannot alone attempt toreplace leaders. Similar to the common-case protocol f + 1 replicasmust participate in every view-change quorum.

This design summarizes, to the best of our knowledge, mostexisting trust-bft protocols, which adopt a similar structure. Wesummarize other approaches in Figure 3 and explain them in detailin Section 10.

Challenges. trust-bft protocols aim to provide the same safetyand liveness guarantees as their bft counterparts. However, weidentify several challengeswith their designs. We observe that theseprotocols: (i) offer limited liveness for clients, (ii) encounter safetyconcerns associated with trusted components, and (iii) disallowconsensus invocations from proceeding in parallel. We will expand

on these in upcoming sections. To address these challenges, wepresent the design of our novel FlexiTrust protocols in Section 8.

5 RESTRICTED RESPONSIVENESS

We first identify a liveness issue in current trust-bft protocols: asingle delayed message at an honest replica is sufficient to preventa client from receiving a result for its transaction until the nextcheckpoint. For any production-scale system, responsiveness (orclient liveness) is as important a metric as replica liveness (the oper-ation committing at the replica). Specifically, trust-bft protocolsguarantee that a quorum of 𝑓 + 1 replicas will commit a request,but do not guarantee that sufficiently many honest replicas willactually execute it. These protocols cannot guarantee that the clientswill receive 𝑓 + 1 identical responses from distinct replicas, whichis the necessary threshold to validate execution results The easewith which such a liveness issue can be triggered highlights thebrittleness of current trust-bft approaches.

We highlight this issue using Pbft-EA as an example, but, to thebest of our knowledge, the same problem arises with all existingtrust-bft protocols.

Claim 1. In a replicated system S of replicas R and clients C,where |R | = n = 2f + 1, a single delayed message can prevent a client

from receiving its transaction results.

Proof. Assume a run of the Pbft-EA protocol (Figure 4). Weknow that f of the replicas in R are byzantine. We represent themwith set F . The remaining f +1 replicas are honest. Let us distributethem into two groups: 𝐷 and r, such that |𝐷 | = f replicas and r bethe remaining replica.

Assume that the primary p is byzantine (p ∈ F ) and all replicasin F intentionally fail to send replicas in 𝐷 any messages. As ispossible in a partially synchronous system, we further assume thatthe Prepare and Commit messages from the replica r to those in𝐷 are delayed.

p sends a Preprepare message for a client 𝑐’s transaction 𝑇 toall replicas in F and r. These f + 1 replicas are able to prepare andcommit 𝑇 . All messages from replica r to those in 𝐷 are delayed.It follows that r is the only honest replica to receive the trans-action and reply to the client 𝑐 . Replicas in F in contrast, fail torespond. Unfortunately, the client needs f + 1 responses to validatethe correctness of the executed operation, and thus cannot makeprogress.

Eventually, the client will complain to all the replicas that it hasnot received sufficient responses for its transaction 𝑇 . Having notheard from the leader about 𝑇 , replicas in 𝐷 will eventually voteto trigger a view-change. Unfortunately, a view-change requires atleast f +1 votes to proceed; otherwise, byzantine replicas could stallsystem progress by constantly triggering spurious view-changes.The system can thus no longer successfully execute operations:the client will never receive enough matching responses, and noview-change can be triggered to address this issue. □

This attack is not specific to Pbft-EA and applies to other pro-tocols like Trinc, MinBFT, CheapBFT, and Hybster as they havesimilar consensus phases and commit rules. It also applies to stream-lined protocolsHotStuff-M and Damysuswhich frequently rotateprimaries.

Dissecting BFT Consensus: In Trusted Components we Trust!

Replicas Trusted Liveness asbft Out-of-Order Memory Only Primary needs

active TC Protocols

2f + 1Log ✗ ✗ High ✗ Pbft-EA, HotStuff-M

Counter + Log ✗ ✗ Order of Log-size ✗ Trinc, Hybster, DamysusCounter ✗ ✗ Low ✗ MinBFT, MinZZ, CheapBFT

3f + 1 Counter ✓ ✓ Low ✓ FlexiTrust (Flexi-BFT, Flexi-ZZ)

Figure 3: Comparing trust-bft protocols. From left to right: Col1: type of trusted abstraction; Col2: identical liveness guar-antees as bft protocols; Col3: support for out-of-order consensuses; Col4: amount of memory needed; and Col5: only primary

replica requires active trusted component. FlexiTrust are our proposals, which we present in this paper.

tr2r2𝐷 =

tr1r1r =

tppF =

𝑐𝑇 𝑇

Resend

view

changeDelayed Delayed

Preprepare Prepare Commit Reply Wait Complain

Pbft-EA

Ignore

< f + 1 r3𝐷 =

r2r1r =

pF =

𝑐𝑇

Delayed Delayed

Preprepare Prepare Commit Reply

Pbft

Figure 4: Disruption in service in Pbft-EA protocol due to weak quorums in comparison to the Pbft protocol (Section 5).

Weak Quorums. The smaller set of replicas in trust-bft pro-tocols is the culprit. A quorum of f + 1 replicas is sufficient to safelycommit a transaction as the order assigned to this client transactionis already attested by the trusted components. As a result, only asingle honest replica (one out of f + 1) is guaranteed to know that atransaction successfully committed. While Byzantine replicas cannever equivocate [12], they can deny service to clients, for instanceby choosing to never reply to the client after having committedthe transaction. In this situation, the sole honest replica with theknowledge that this transaction has been committed is powerless(as it has a single vote) and cannot convince the client that its exe-cution result is in fact valid. In traditional bft protocols, the largerquorum sizes of 2f + 1 guarantee both safety and responsiveness:f +1 replicas out of 2f +1 are guaranteed to be honest, and thereforeboth execute the request and reply to the client. We highlight thisdifference in Figure 4 where the same attack described above failsto stall the clients in Pbft.

Mechanisms do exist to sidestep this issue, but they come at asignificant performance cost. It is, for instance, possible to bringhonest replicas up-to-date using checkpointing. Unfortunately, thisimplies that the clients will incur latency that is directly depen-dent on checkpoint frequency (which tends to be relatively infre-quent). Alternatively, replicas could systematically broadcast theircommit-certificate to other replicas in the system. Each such commit-certificate would include f+1 confirmations that f+1 replicas agreedto commit this transaction. It is thus safe for other replicas to alsoexecute the transaction upon receiving this certificate. Unfortu-nately, this requires an additional all-to-all communication phase,thus reducing many of the benefits of the lower replication factor.

6 EXCESSIVE TRUST FOR SAFETY

Existing trust-bft protocols require some state to be persisted onthe trusted hardware, corresponding to the logged requests or thecounter values. These systems rely on this property to guaranteesafety. Despite any failures or attacks, these protocols expect this

tr2r2𝐷 =

tr1r1𝐺 =

tppF =

𝑐

𝜙

𝜙

𝜙

𝑇

𝑇

𝜙

𝜙

𝑇

𝜙

𝑇

Delayed Delayed

Initial State Preprepare Prepare Commit Reply state rollback state

Figure 5: Loss of safety in trust-bft protocols (eg. Pbft-EA)

due to rollback of trusted component state (Section 6).

state to remain uncorrupted and available. Unfortunately, realizingthis assumption in available implementations of trusted hardware ischallenging. Intel SGX enclaves, for instance, are the most popularplatform for trusted computing [3, 9, 45], but can suffer from power-failures and rollback attacks. While recent works present issuesto mitigate these attacks, they remain limited in scope and havehigh costs [20]. On the other hand, hardware that has been shownto resist these attacks, such as SGX persistent counters [16] orTPMs [22], is prohibitively slow: they have upwards of tens ofmilliseconds access latency and support only a limited number ofwrites [16, 36, 43].

Persistent state, is, as we show below, necessary for correctnessof trust-bft protocols; a rollback attack can cause a node to equiv-ocate. Once equivocation is again possible in a trust-bft protocol,a single malicious node can cause a safety violation. To illustrate,consider the following run of the Pbft-EA protocol.

Assume a run of the Pbft-EA protocol (Figure 5). We knowthat f replicas are byzantine and we denote them as set F . Wedistribute the remaining honest replicas into groups 𝐷 and 𝐺 , suchthat |𝐷 | = f and 𝐺 = 1 replicas.

Assume that the primary p is byzantine and all replicas in Fintentionally fail to send replicas in 𝐷 any messages. As is possiblein a partially synchronous system, we further assume that the

Suyash Gupta, Sajjad Rahnama, Shubham Pandey, Natacha Crooks, and Mohammad Sadoghi

Prepare and Commit messages from the replica in𝐺 to those in 𝐷

are delayed. p asks its trusted component to generate an attestationfor a transaction 𝑇 to be ordered at sequence number 1. Followingthis, p sends a Preprepare message for 𝑇 to all replicas in F and𝐺 . These f + 1 replicas are able to commit 𝑇 , and they execute 𝑇and reply to the client. As a result, the client receives f + 1 identicalresponses and marks 𝑇 complete.

Now, assume that the byzantine primary p rollbacks the state ofits trusted component tp. Following this, p ask its tp to generate anattestation for a transaction 𝑇 ′ to be ordered at sequence number1. Next, p sends a Preprepare message for 𝑇 ′ to all replicas in 𝐷 .Similarly, these replicas will be able to commit and execute 𝑇 ′ andthe client will receive f + 1 responses. There is a safety violation asreplicas in 𝐷 and𝐺 have executed two different transactions at thesame sequence number.

The naive solution to address this issue would be to use TPMs atevery node, as they do not suffer from rollback attacks. This solutionwould be safe but prohibitive performance-wise [43]. We proposean alternative approach: by increasing the replication factor back to3f + 1, we can design a protocol that remains safe when it performsa single access per transaction to safe but expensive hardware likeTPM. We describe this protocol next, and show in the evaluationthat this approach effectively minimizes the access latency cost ofusing more expensive trusted hardware.

7 IN-ORDER CONSENSUS

trust-bft protocols are inherently sequential: they order clientrequests one at a time and cannot support parallel consensus in-vocations. While pipelining consensus phases [18] and creatinglarge batches of requests [11, 29] can mitigate this limitation, se-quential processing of requests creates an artificial throughputbound on what these protocols can achieve (batch size / (numberphases * RTT)) [26, 50]. This is in direct contrast to traditional bftprotocols [8, 26, 41, 50] which allow invoking multiple consensusinstances in parallel; their throughput is thus bound by availableresources in the system.Hybster [6] attempts to mitigate this issueby allowing each of the n replicas to act as a primary in parallel,but each associated consensus invocation is sequential.

trust-bft protocols require that each message sent by anyreplica for a transaction 𝑇𝑗 be assigned the same sequence number.In the case of Pbft-EA protocol, for instance, no replica can callthe Append function on the Prepare message for transaction 𝑇𝑗before calling Append on Prepare for 𝑇𝑖 if 𝑖 < 𝑗 . We quantify thiscost experimentally in Section 9 and show that despite their lowerreplication factor, trust-bft protocols achieve lower throughputthan their bft counterparts. Moreover, with 2f + 1 replicas, thislimitation is fundamental. Specifically, we show that:

Claim 2. In a system S running a trust-bft protocol 𝑋 , if a

primary p assigns two transactions 𝑇𝑖 and 𝑇𝑗 sequence numbers 𝑖, 𝑗 ,

and 𝑖 < 𝑗 , then p cannot complete any phase of protocol 𝑋 for 𝑇𝑗before completing that phase for 𝑇𝑖 .

Proof. Assume that p allows consensus invocations of 𝑇𝑖 and𝑇𝑗 to proceed in parallel, and like any trust-bft protocol the firstphase of protocol 𝑋 is also Preprepare. This implies that a replicar may receive Preprepare message for 𝑇𝑗 before 𝑇𝑖 . Further, on

receiving a message, r calls the Append function to access its tr,which in turn attests this message for this specific order.

In our case, r would call Append on 𝑇𝑗 (before 𝑇𝑖 ) and will re-ceive an attestation ⟨Attest(𝑞, 𝑗,𝑇𝑗 )⟩tr , which it will forward to allthe replicas. Now, when r receives Preprepare message for 𝑇𝑖 , itsattempt to call the Append function would fail. Its tr ignores thismessage as 𝑖 < 𝑗 and tr cannot process a lower sequence numberrequest. The consensus for 𝑇𝑖 will not complete, stalling progress.We have a contradiction. □

8 FLEXITRUST PROTOCOLS

The previous sections highlighted several significant limitationswith existing trust-bft approaches, all inherited from their lowerreplication factor. In this section, we make two claims: (i) 2f + 1is simply not enough: it either impacts responsiveness or requiresan extra phase of all-to-all communication (§5); requires the use ofslow persistent trusted counters for every message (§6); and forcessequential consensus decisions (§7). (ii) Trusted components arestill beneficial to bft consensus if used with 3f + 1 replicas. Theycan, for instance, be used to reduce either the number of phases orcommunication complexity.

To this effect, we present FlexiTrust, a new set of bft consensusprotocols that make use of trusted components. These protocols sat-isfy the standard safety/liveness conditions described in Section 4,ensure liveness for clients, and, through the use of 3f + 1 replicas,achieve the following appealing performance properties:

(G1) Out-of-order Consensus. FlexiTrust protocols allowreplicas to process messages out-of-order.

(G2) No Trusted Logging. FlexiTrust protocols expect lowmemory utilization at trusted components as they do not requiretrusted logging.

(G3) Minimal Trusted Component Use. FlexiTrust proto-cols require accessing a single trusted component per transactionrather than one per message

We first give an overview of the three steps necessary to trans-form a trust-bft protocol into its FlexiTrust counterpart, beforediscussing two specific FlexiTrust protocols: Flexi-BFT based onMinBFT [52] protocol, and Flexi-ZZ based onMinZZ [52].

8.1 Designing a FlexiTrust protocol

We make three modifications to trust-bft protocols. Together,these steps are sufficient to achieve significantly greater perfor-mance, and better reliability.

First, we modify the Append functionality. Recall that the partic-ipants use this function to bind a specific message with a countervalue. The value of this counter can be supplied by the replicabut must increase monotonically; no two messages can be boundto the same value. We restrict this function to preclude replicasfrom supplying their own sequence number, and instead have thetrusted component increment counters internally, thus ensuringthat counter values will remain contiguous.

AppendF(𝑞, 𝑥) – Assume the 𝑞-th counter of tr has value 𝑘 . Thisfunction increments 𝑘 to 𝑘 + 1, associates 𝑘 with message 𝑥 ,and returns an attestation ⟨Attest(𝑞, 𝑘, 𝑥)⟩tr as a proof ofthis binding.

Dissecting BFT Consensus: In Trusted Components we Trust!

r3r2r1tpp𝑐

𝑇

Preprepare Prepare Reply

Figure 6: Flow control of the Flexi-BFT protocol where only

the primary requires access to an active trusted component.

This change is necessary to support out-of-order consensus in-stances efficiently: while multiple transactions can be ordered inparallel, the execution of these transactions must still take place insequence number order. A Byzantine replica could stall the system’sprogress by issuing a sequence number that is far in the future,causing honest replicas to trigger frequent view changes to "fill"the gap with no-ops [8].

Second, we reduce the number of calls to trusted counters. Theprimary is the only replica to solicit a new value from the trustedcounter. All other participants simply validate the trusted counter’ssignature when receiving a message from the primary. Specifi-cally, upon receiving a client request𝑚 := 𝑇 , the primary invokesAppendF(𝑞,𝑚) to bind a unique counter value 𝑘 to𝑚 and returnsan attestation ⟨Attest(𝑞, 𝑘,𝑚)⟩tp . The primary then forwards thisattestation as part of its first consensus phase. Crucially, replicas,upon receiving this message, simply verify the validity of the sig-nature but no longer access their trusted components.

Finally, we increase the quorum size necessary to proceed tothe next phase of consensus to 2f + 1. This higher quorum sizeguarantees that any two quorums will intersect in at least f + 1distinct replicas (and thus in one honest replica). Forcing an honestreplica to be part of every quorum makes the need for accessing atrusted counter redundant as this replica will, by definition, neverequivocate.

8.2 Case Study: Flexi-BFT

We apply our transformations to MinBFT [52], a two-phasetrust-bft protocol that makes use of trusted counters. MinBFTrequires one less phase than Pbft and Pbft-EA (two-phases total):as the primary cannot equivocate, it is safe to commit a transactionin MinBFT after receiving f + 1 Prepare messages. Flexi-BFT, thenew protocol that we develop, preserves this property, but remainsresponsive and makes minimal use of trusted components (once perconsensus). The view-change and checkpointing protocols remainidentical to the Pbft view-change, we do not discuss them in detailhere. We include pseudocode in Figure 7.

As stated, Flexi-BFT consists of two phases. Upon receiving atransaction 𝑇 , the primary p of view 𝑣 requests its trusted compo-nent to generate an attestation for𝑇 . This attestation ⟨Attest(𝑞, 𝑘,𝑚)⟩tpstates that 𝑇 will be ordered at position 𝑘 (Line 6, Figure 7). Theprimary broadcasts a Preprepare with this proof to all replicas.When a replica r receives a valid Preprepare message in view 𝑣 ,it marks the transaction 𝑇 as prepared. Prepared transactions havethe property that no conflicting transaction has been prepared forthe same counter value in the same view. In the Pbft protocol, anadditional round is necessary to mark messages as prepared, as

Initialization:

// 𝑞 be the latest counter with value 𝑘 .// 𝑣 be the view-number, determined by the identifier of primary.

Client-role (used by client 𝑐 to process transaction𝑇 ) :1: Sends ⟨𝑇 ⟩𝑐 to the primary p.2: Awaits receipt of messages Response( ⟨𝑇 ⟩𝑐 , 𝑘, 𝑣, 𝑟 ) from f + 1 replicas.3: Considers𝑇 executed, with result 𝑟 , as the 𝑘-th transaction.

Primary-role (running at the primary p) :4: event p receives ⟨𝑇 ⟩𝑐 do

5: Calculate digest Δ := Hash( ⟨𝑇 ⟩𝑐 ) .6: {𝑘, 𝜎 } := AppendF(𝑞,Δ)7: Broadcast Preprepare( ⟨𝑇 ⟩𝑐 ,Δ, 𝑘, 𝑣, 𝜎) .

Non-Primary Replica-role (running at the replica r) :8: event r receives Preprepare( ⟨𝑇 ⟩𝑐 ,Δ, 𝑘, 𝑣, 𝜎) from p such that:

• message is well-formed.• r did not accept a 𝑘-th proposal from p.• Attestation 𝜎 is valid.

do

9: Broadcast Prepare(Δ, 𝑘, 𝑣, 𝜎) .

Replica-role (running at any replica r) :10: event r receives Prepare(Δ, 𝑘, 𝑣, 𝜎) messages from 2f + 1 replicas such that:

all the messages are well-formed and identical. do11: if r has executed transaction with sequence number 𝑘 − 1 ∧ 𝑘 > 0 then12: Execute𝑇 as the 𝑘-th transaction.13: Let 𝑟 be the result of execution of𝑇 (if there is any result).14: Send Response( ⟨𝑇 ⟩𝑐 , 𝑘, 𝑣, 𝑟 ) to 𝑐 .15: else

16: Place𝑇 in queue for execution.

Trusted-role (running at the trusted component tp) :17: event tp is accessed through AppendF(𝑞,Δ) do18: Increment 𝑞-th counter; 𝑘 := 𝑘 + 119: 𝜎 := Generate attestation ⟨Attest(𝑞, 𝑘,Δ) ⟩tp .20: Return {𝑘, 𝜎 }.

Figure 7: Flexi-BFT protocol (failure-free path).

replicas can equivocate. The replica r then broadcasts a Preparemessage in support of 𝑇 and includes the attestation. When r re-ceives Prepare messages from 2f + 1 distinct replicas in the sameview, it marks𝑇 as committed. rwill execute𝑇 once all transactionswith sequence numbers smaller than 𝑘 have been executed. Theclient marks 𝑇 as complete when it receives matching responsesfrom f + 1 replicas.

Replicas initiate the view-change protocol when they suspectthat the primary has failed. As stated previously, the view-changelogic is identical to the Pbft view-change; we only describe it herebriefly. A replica in view 𝑣 enters the view change by broadcasting aViewChange message to all replicas. Each ViewChange messagesent by a replica r includes all the valid prepared and committed

messages received by r with relevant proof (the trusted componentattestation for the Preprepare message and the 2f + 1 Preparemessages for committed messages). The new primary starts thenew view if it receives ViewChange message from 2f + 1 replicasfor view 𝑣 + 1.

8.3 Case Study: Flexi-ZZ

We now transform MinZZ [52] into another novel FlexiTrustprotocol, Flexi-ZZ. MinZZ is a trust-bft protocol, which followsthe design proposed by Zyzzyva [30]. Zyzzyva introduces a bftconsensus with a single-phase fast-path (when all replicas are hon-est and respond) and a two-phase slow-path.MinZZ uses trustedcounters to reduce the replication factor from 3f + 1 to 2f + 1. Thecost of transformingMinZZ to Flexi-ZZ is that, once again, we use

Suyash Gupta, Sajjad Rahnama, Shubham Pandey, Natacha Crooks, and Mohammad Sadoghi

Initialization:

// 𝑞 be the latest counter with value 𝑘 .

Client-role (used by client 𝑐 to process transaction𝑇 ) :1: Sends ⟨𝑇 ⟩𝑐 to the primary p.2: Awaits receipt of messages Response( ⟨𝑇 ⟩𝑐 , 𝑘, 𝑣, 𝑟 ) from 2f + 1 replicas.3: Considers𝑇 executed, with result 𝑟 , as the 𝑘-th transaction.

Primary-role (running at the primary p of view 𝑣) :4: event p receives ⟨𝑇 ⟩𝑐 do

5: Calculate digest Δ := Hash( ⟨𝑇 ⟩𝑐 ) .6: {𝑘, 𝜎 } := AppendF(𝑞,Δ)7: Broadcast𝑚 := Preprepare( ⟨𝑇 ⟩𝑐 ,Δ, 𝑘, 𝑣, 𝜎) .8: Execute(𝑚).

Non-Primary Replica-role (running at the replica r) :9: event r receives𝑚 := Preprepare( ⟨𝑇 ⟩𝑐 ,Δ, 𝑘, 𝑣, 𝜎) from p such that:

• message is well-formed.• r did not accept a 𝑘-th proposal from p.• Attestation 𝜎 is valid.

do

10: Execute(𝑚).

Trusted-role (running at the trusted component tp) :11: event tp is accessed through AppendF(𝑞,Δ) do12: Increment 𝑞-th counter; 𝑘 := 𝑘 + 113: 𝜎 := Generate attestation ⟨Attest(𝑞, 𝑘,Δ) ⟩tp .14: Return {𝑘, 𝜎 }.

15: function Execute (message:𝑚)16: if r has executed transaction with sequence number 𝑘 − 1 ∧ 𝑘 > 0 then17: Execute𝑇 as the 𝑘-th transaction.18: Let 𝑟 be the result of execution of𝑇 (if there is any result).19: Send Response( ⟨𝑇 ⟩𝑐 , 𝑘, 𝑣, 𝑟 ) to 𝑐 .20: else

21: Place𝑇 in queue for execution.

Figure 8: Flexi-ZZ protocol (common-case).

3f + 1 replicas. However, there are several benefits (1) Flexi-ZZ canalways go fast-path as it only requires n − f matching responses(compared to the n for bothMinZZ and Zyzzyva). This helps im-prove performance under byzantine attacks, which past work hasdemonstrated is an issue for Zyzzyva [13, 14, 26]. (2) Flexi-ZZmin-imizes the use of trusted components: a single access to a trustedcounter is required at the primary per consensus invocation. (3)Flexi-ZZ’s view-change is significantly simpler than Zyzzyva’sview-change. View-change protocols are notorious complex anderror-prone [1, 21] to design and implement; a simple view-changeprotocol thus increases confidence in future correct deploymentsand implementations. We present pseudocode in Figure 8.

Common Case. A client 𝑐 submits a new transaction𝑇 by send-ing a signed message ⟨𝑇 ⟩𝑐 to the primary p. The primary p invokesthe AppendF(𝑞,𝑚) function, binding the transaction to a specificcounter value 𝑘 and returning an attestation ⟨Attest(𝑞, 𝑘,𝑚)⟩tpas proof. This step prevents p from assigning the same sequencenumber 𝑘 to two conflicting messages𝑚 and𝑚′. The primary thenforwards this attestation along with the transaction to all replicas.Replicas, upon receiving this message, execute the transaction insequence order, and reply directly to the client with the response.The client marks the transaction𝑇 complete when it receives 2f + 1identical responses in matching views.

View Change. If the client does not receive 2f + 1 matchingresponses, it re-broadcasts its transaction to all replicas; the pri-mary may have been malicious and failed to forward its request.Upon receiving this broadcast request, a replica either (1) directlysends a response (if it has already executed the transaction 𝑇 ) or,(2) forwards the request to the primary and starts a timer. If the

timer expires before the replica receives a Preprepare message for𝑇 , it initiates a view-change. Specifically, the replica enters view𝑣 + 1, stops accepting any messages from view 𝑣 , and broadcasts aViewChange message to all replicas. ViewChange messages in-clude all requests for which r has received a Preprepare message.

Upon receiving ViewChange messages from 2f + 1 replicas inview 𝑣+1, the replica designated as the primary for view 𝑣+1 (say p′)creates a NewView message and broadcasts it to all replicas. Thismessage includes: (1) the set of ViewChangemessages received bythe primary as evidence, and (2) the (sorted-by-sequence-number)list of transactions that may have committed. The primary p′ createsa new trusted counter1 and sets it to the transaction with the lowestsequence number. p′ then re-proposes all transactions in this list,proposing specific no-op operations when there is a gap in the logbetween two re-proposed transactions.

To re-propose these transactions, the primary proceeds in thestandard fashion: it accesses its trusted counter, obtains a uniquecounter value (setting the counter to the transaction with the lowestsequence number ensures that sequence numbers remain the sameacross views), and broadcasts the transaction and its attestation aspart of a new Preprepare message. This mechanism guaranteesthat all transactions that could have been perceived by the client ascommitted (the client receiving 2f + 1 matching replies) will be re-proposed in the same order: for the client to commit an operation, itmust receive 2f + 1 matching votes; one of those those votes is thusguaranteed to appear the NewView message. Transactions thatwere executed by fewer than 2f + 1 replicas, on the other hand, maynot be included in the new view, which may force some replicas torollback.

In Appendix A, we prove the correctness of our protocols.

9 EVALUATION

In our evaluation, we answer the following questions:• What are the costs of SGX accesses? (§9.3)• How well do our protocols scale? (§9.4 and §9.5)• What is the impact of batching requests? (§9.6)• What is the impact of wide area replication? (§9.7)• What is the impact of failures? (§9.8)• What is the impact of real-world adoption? (§9.9)

9.1 Implementation

We use the open-source ResilientDB fabric to implement all theconsensus protocols [23, 25, 26, 41]. ResilientDB supports all stan-dard bft optimizations, including multithreading at each replicaand both client and server batching. The system relies on CMAC forMAC, ED25519 for DS-based signatures and SHA-256 for hashing.

SGX Enclaves. We use Intel SGX for Linux [16] to implementthe abstraction of a trusted component at each replica. Specifi-cally, we implement multiple monotonically increasing countersinside each enclave, which can be concurrently accessed by multi-ple threads through the function GetSequenceNo(<counter-id>).This API call returns an attestation that includes the latest value ofthe specific counter and a DS that proves that this counter value

1Existing trusted hardware allow creating new counters and provide a certificatethat validates the identity of this new counter. The host replica must exchange thiscertificate with other replicas [11, 33].

Dissecting BFT Consensus: In Trusted Components we Trust!

a b c d e f g0

20K

40K

60K

80K

Throug

hput

(txn/

s)

Figure 9: Impact of trusted counter (TC) and signature attestations

(SA) on Pbft. [a] Standard Pbft protocol. [b] Primary (P) requires

TC in Preprepare phase (Prep). [c] P requires TC and SA in Prep. [d]

P requires TC and SA in all three phases. [e] All replicas require TC

in Prep. [f] All replicas require TC and SA in Prep. [g] All replicas

require TC and SA in all three phases.

was generated by the relevant trusted component. To highlight thepotential of trusted components under the 3f + 1 regime, we imple-ment counters inside of the SGX enclave instead of leveraging IntelSGX Platform Services for trusted counters as they have prohibi-tively high latency and are not available on most cloud providers.We highlight the trade-offs associated with the choice of trustedhardware in Section 9.9.

9.2 Evaluation Setup

We compare our FlexiTrust protocols against six baselines: (i)Pbft [8], available with ResilientDB, as it outperforms the BFTS-mart’s [7] implementation, which is single-threaded and sequen-tial [23, 26, 41, 50]; (ii) Zyzzyva [30], a linear single phase bft pro-tocol where client expects responses from all the 3f + 1 replicas; (iii)Pbft-EA [11], a three phase trust-bft protocol; (iv) MinBFT [52],a two phase trust-bft protocol; (v) MinZZ [52], a linear singlephase trust-bft protocol where client expects responses from allthe 2f+1 replicas; and (vi)Opbft-ea, a variation of Pbft-EA createdby us that supports out-of-order consensus invocations. We do notcompare against streamlined bft protocols such as Hotstuff [56]or Damysus [18], as their chained nature precludes concurrentconsensus invocations.

We use the Oracle Cloud Infrastructure (OCI) and deploy up to97 replicas on VM.Standard.E3.Flex machines (16 cores and 16GiBRAM) with 10GiB NICs. Unless explicitly stated, we use the follow-ing setup for our experiments. We select a medium-size setup wherethere can be up to f = 8 byzantine replicas. trust-bft and Opbft-ea thus require n = 17 replicas total while bft and FlexiTrustprotocols need n = 25 replicas. We intentionally choose a higher 𝑓to maximize the potential cost of increasing the replication factorto 3𝑓 + 1. Each client runs in a closed-loop; we use a batch sizeof 100. We run each experiment for 180 seconds with 60 secondswarmup time and report the average results over three runs.

We adopt the popular Yahoo Cloud Serving Benchmark (YCSB) [15,19, 26]. YCSB generates key-value store operations that access adatabase of 600 k records.

9.3 Trusted Counter Costs. In Figure 9, we quantify the costsof accessing trusted counters. To do so, we run a Pbft implemen-tation with a single worker-thread. Bar [a] represent our baselineimplementation; we report peak throughput numbers for each setup.Throughput degradation occurs when the primary replica needs

to access the trusted component (Bar [b]). This degradation ac-celerates when the primary replica requires trusted componentto perform signature attestations (Bar [c]), and needs to performthese operations during each phase of consensus (Bar [d]). The dropin throughput from [a] to [d] is nearly 2×. Bars [e] to [g] extendthe use of trusted components to non-primary replicas. However,the relative throughput of the protocol remains the same as theprotocol gets bottlenecked at the primary replica due to the highernumber of messages that it must process.

9.4 Throughput Results. In Figure 10(i), we increase the num-ber of clients as follows: 4 k, 8 k, 24 k, 48 k, 64 k, and 80 k and plotthe throughput versus latency metric for all protocols. Pbft-EAachieves the lowest throughput as it requires three phases for con-sensus and disallows concurrent consensus invocations. The re-duced replication factor of 2f + 1 does not help performance asthreads are already under-saturated: the system is latency-boundrather than compute bound due to the protocol’s sequential pro-cessing of consensus invocations. Opbft-ea protocol attains upto 6% higher throughput (and lower latency) than Pbft-EA as itsupports concurrent request ordering but bottlenecks on trustedcounter accesses at replicas, as Figure 9 suggests. Specifically, thereplica’s worker thread has to sign the outgoing message, and per-form two verifications on each received message: (i) MAC of thereceived message, and (ii) DS of the attestation from the trustedcomponent.MinBFT andMinZZ achieve up to 47% and 68% higherthroughputs than Pbft-EA respectively, as they reduce the numberof phases necessary to commit an operation (from three to two forMinBFT, and from three to one in the failure-free case for MinZZ).Interestingly, Pbft yields better throughput than all trust-bftprotocols. The combination of concurrent consensus invocationsand lack of overhead stemming from the use of trusted countersdrives this surprising result. Our Flexi-BFT and Flexi-ZZ proto-cols instead achieve up to 22% and 58% higher throughput thanPbft, (and up to 87% and 77% higher throughput overMinBFT andMinZZ). This performance improvement stems from reducing thenumber of phases, accessing a trusted counter once per consensusinvocation, and permitting concurrent consensus invocations.

9.5 Scalability. Figures 10(ii) and 10(iii) summarize the proto-cols’ scaling behavior as the number of replicas increases from9(f = 4) to 65(f = 32)) for trust-bft and Opbft-ea protocolsand 13 to 97 for bft and FlexiTrust protocols. As expected, anincreased replication factor leads to a proportional increase in thenumber ofmessages that are propagated and verified. This increasedcost leads to a significant drop in the latency and throughput of allprotocols: going from f = 4 to f = 8 causes Pbft, Flexi-BFT, andFlexi-ZZ’s performance to drop 3.89×, 2.48×, and 2.54× respec-tively. MinBFT, and MinZZ’s throughput also drops by a factorof 2.66× and 2.67×. This performance drop is larger for bft andFlexiTrust protocols than for trust-bft protocols as replicas arenever fully saturated due to sequential consensus invocations.

9.6 Batching. Figures 10(iv) and 10(v) quantify the impact ofbatching client requests; we increase the batch size from 10 to 5 k.As expected, the throughput of all protocols increases as batch sizesincrease until the message size becomes too large, after which thereare additional communication costs.

9.7 Wide-area replication. For this experiment (Figures 10(vi)and 10(vii)), we distribute the replicas across five countries in six

Suyash Gupta, Sajjad Rahnama, Shubham Pandey, Natacha Crooks, and Mohammad Sadoghi

Pbft-EA MinBFT MinZZ Opbft-ea Flexi-BFT Flexi-ZZ Pbft Zyzzyva

0 0.2 0.4 0.6 0.8 1 1.2 1.40

40K

80K

120K

160K

Latency (s)

Throug

hput

(txn/

s)(i) Throughput vs. Latency

4 8 16 24 320

40K

80K

120K

160K

200K

Number of Replicas (in f)

Throug

hput

(txn/

s)

(ii) Scalability (Throughput)

4 8 16 24 320.0

0.5

1.0

1.5

Number of Replicas (in f)

Latency(s)

(iii) Scalability (Latency)

10 100 500 1000 50000

40K

80K

120K

160K

200K

Batch size

Throug

hput

(txn/

s)

(iv) Batch Size (Throughput)

10 100 500 1000 50000.0

0.5

1.0

1.5

Batch size

Latency(s)

(v) Batch Size (Latency)

1 2 3 4 5 645K

55K

65K

Number of Regions

Throug

hput

(txn/

s)

(vi) Regions (Throughput)

1 2 3 4 5 60.7

0.9

1.0

1.2

Number of Regions

Latency(s)

(vii) Regions (Latency)

Figure 10: Throughput results as a function of number of clients, number of replicas, batch size and cross-site latency. We set

the number of replicas in f as for bft and FlexiTrust protocols n = 3f + 1 and for trust-bft and Opbft-ea n = 2f + 1.

4 8 16 24 320

40K

80K

120K

160K

200K

Number of Replicas (in f)

Throug

hput

(txn/

s)

One Faulty Replica (Throughput)

4 8 16 24 320.0

0.5

1.0

1.5

2.0

Number of Replicas (in f)

Latency(s)

One Faulty Replica (Latency)

Figure 11: Impact of failure of one non-primary replica.

location: San Jose, Ashburn, Sydney, Sao Paulo, Montreal, and Mar-seille, and use the regions in this order. We set f = 20; n = 41 andn = 61 replicas for 2f + 1 and 3f + 1 protocols, respectively.2 Aswe keep the number of replicas constant, we observe slight dipin performance for all protocols, until we move to 4 regions afterwhich performance remains unchanged. Each replica only waitsfor messages from f + 1 (21) or 2f + 1 (41) other replicas, which arereceived from the nearest 4 regions.

9.8 Single Replica Failure. Next, we consider the impact offailures on our protocols (Figure 11). UnlikeMinZZ and Zyzzyva,our Flexi-ZZ protocol’s performance does not degrade as it can han-dle up to f non-primary replica failures on the fast path. In contrast,bothMinZZ and Zyzzyva require their clients to receive responsesfrom all replicas. Otherwise, they incur a second communicationphase on the slow-path.

9.9 Real-World Adoption

This paper’s objective is to highlight current limitations of exist-ing trust-bft approaches, be it hardware-related or algorithmic.Trusted hardware, however, is changing rapidly. Current SGX en-claves are subject to rollback attacks, but newer enclaves (Keystone,

2As the LCM of numbers in the range [1, 6] is 60, so we set 3f + 1 = 61, which yieldsf = 20.

Access cost (in ms) Throughput (in txn/s)0.5 87 k1.0 87 k1.5 67 k2.0 50 k2.5 40 k3.0 34 k10 10 k30 3 k100 993200 494

Figure 12: Peak throughput for Flexi-ZZ protocol on vary-

ing the time taken to access a trusted counter while running

consensus among 97 replicas.

Nitro) may not be. Similarly, accessing current SGX persistent coun-ters or TPMs currently takes between 80ms to 200ms for TPMsand between 30ms to 187ms for SGX [33, 38, 43]. New technologyis rapidly bringing this cost down; counters like ADAM-CS [37] re-quires less than 10ms. Our final experiment aims to investigate thecurrent performance of trust-bft protocols on both present andfuture trusted hardware. Our previous results were obtained usingcounters inside of SGX enclave as hardware providing access toSGX persistent counters and TMPs are not readily avalable on cloudproviders. In this experiment, we gage the impact of throughput andlatency as we increase the time to access the trusted counter (Fig-ure 12) on our Flexi-ZZ protocol. We select Flexi-ZZ as it alreadyminimizes the use of trusted hardware. We run this experiment on97 replicas, and highlight that, for this setup Pbft yields 40 ktxn/s.We find that Flexi-ZZ outperforms Pbft when the latency of thetrusted component is less than 2.5ms. This result highlights a pathwhereby, as trusted hardware matures, trust-bft protocols willbe an appealing alternative to standard bft approaches.

Dissecting BFT Consensus: In Trusted Components we Trust!

10 RELATEDWORK

There is long line of research on designing efficient bft protocols [4,5, 8, 23, 25, 26, 30, 41, 50, 56]. As stated, trust-bft protocols preventreplicas from equivocating, reducing the replication factor or thenumber of phases necessary to achieve safety. We summarizedPbft-EA in Section 4 and now describe other trust-bft protocols.

CheapBFT [29] uses trusted counters and optimizes for thefailure-free case by reducing the amount of active replication tof + 1. When a failure occurs, however, CheapBFT requires that all2f + 1 replicas participate in consensus. The protocol has the samenumber of phases as MinBFT; as higlighted in Sections 5 and 7,the protocol is inherently sequential and may not be responsive toclients.

Hybster [6] is a meta-protocol that takes as an input an ex-isting trust-bft protocol (such as Pbft-EA, MinBFT, etc.) andrequires each of the n replicas to act as parallel primaries; a com-mon deterministic execution framework will consume these localconsensus logs to execute transactions in order. While multipleprimaries improve concurrency, each primary locally invokes con-sensus in-sequence; each sub-log locally inherits the limitations ofexisting trust-bft protocols. There continues to be an artificialupper-bound on the amount of parallelism supported in the sys-tem. Moreover, recent work shows that designing multiple primaryprotocols is hard as f of these primaries can be byzantine and cancollude to prevent liveness [25, 49].

Streamlined protocols like HotStuff-M [55] and Damysus [18]follow the design of HotStuff [56]. They linearize communica-tion by splitting the all-to-all communication phases (Prepare andCommit) into two linear phases. These systems additionally rotatethe primary after each transaction, requiring f + 1 replicas to sendtheir last committed message to the next primary. The next primarythen selects the committed message with the highest view numberas the baseline for proposing the next transaction for consensus.HotStuff-M makes use of trusted logs; Damysus requires its repli-cas to have trusted components that provide support for both logsand counters. Specifically, Damysus requires two types of trustedcomponents at each replica, an accumulator and a checker. Theprimary leverages the accumulator to process incoming messagesand create a certificate summarizing the last round of consensus.Each replica instead accesses the checker to generate sequencenumbers using a monotonic counter and logs information aboutpreviously agreed transactions. These protocols once again sufferfrom a potential lack of responsiveness; their streamlined natureprecludes opportunities to support any concurrency [26, 41].

Microsoft’s CCF framework uses Intel SGX to support build-ing confidential and verifiable services [45, 48]. CCF provides aframework that helps to generate an audit trail for each request.To do so, they log each request and have it attested by the trustedcomponents. CCF provides flexibility of deploying any consensusprotocol.

In the specific case of blockchain systems, Teechain [34] designsa two-layer payment network with the help of SGX. Teechain des-ignates trusted components as treasuries and only allows them tomanage user funds. Teechain permits a subset of treasuries to becompromised, and it handles such attacks by requiring each fundto be managed by a group of treasuries. Ekiden [10] executes smart

contracts directly in the trusted component for better privacy andperformance.

11 CONCLUSION

In this paper, we identified three challenges with the design ofexisting trust-bft protocols: (i) they have limited responsiveness,(ii) they suffer from safety violations in the presence of rollbackattacks, and (iii) their sequential nature artificially bounds through-put. We argue that returning to 3f + 1 is the key to fulfilling thepotential of trusted components in bft. Our suite of protocols, Flex-iTrust, supports parallel consensus instances, makes minimal useof trusted components, and reduces the number of phases neces-sary to safely commit operations while also simplifying notoriouslycomplex mechanisms like view-changes. In our experiments, Flex-iTrust protocols outperform their bft and trust-bft counterpartsby 100% and 185% respectively.

REFERENCES

[1] Ittai Abraham, Guy Gueta, Dahlia Malkhi, Lorenzo Alvisi, Rama Kotla, andJean-Philippe Martin. 2017. Revisiting Fast Practical Byzantine Fault Tolerance.https://arxiv.org/abs/1712.01367

[2] Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstanti-nos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, GennadyLaventman, Yacov Manevich, Srinivasan Muralidharan, Chet Murthy, BinhNguyen, Manish Sethi, Gari Singh, Keith Smith, Alessandro Sorniotti, ChrysoulaStathakopoulou, Marko Vukolić, SharonWeed Cocco, and Jason Yellick. 2018. Hy-perledger Fabric: A Distributed Operating System for Permissioned Blockchains.In Proceedings of the Thirteenth EuroSys Conference. ACM, 30:1–30:15. https://doi.org/10.1145/3190508.3190538

[3] Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, André Martin,Christian Priebe, Joshua Lind, Divya Muthukumaran, Dan O’Keeffe, Mark Still-well, David Goltzsche, David M. Eyers, Rüdiger Kapitza, Peter R. Pietzuch, andChristof Fetzer. 2016. SCONE: Secure Linux Containers with Intel SGX. In 12th

USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016,

Savannah, GA, USA, November 2-4, 2016. USENIX Association, 689–703.[4] Pierre-Louis Aublin, Rachid Guerraoui, Nikola Knezevic, Vivien Quéma, and

Marko Vukolic. 2015. The Next 700 BFT Protocols. ACM Trans. Comput. Syst. 32,4 (2015), 12:1–12:45. https://doi.org/10.1145/2658994

[5] Pierre-Louis Aublin, Sonia Ben Mokhtar, and Vivien Quéma. 2013. RBFT: Re-dundant Byzantine Fault Tolerance. In Proceedings of the 2013 IEEE 33rd Inter-

national Conference on Distributed Computing Systems. IEEE, 297–306. https://doi.org/10.1109/ICDCS.2013.53

[6] Johannes Behl, Tobias Distler, and Rüdiger Kapitza. 2017. Hybrids on Steroids:SGX-Based High Performance BFT. In Proceedings of the Twelfth European Confer-

ence on Computer Systems (EuroSys ’17). Association for Computing Machinery,222–237. https://doi.org/10.1145/3064176.3064213

[7] Alysson Bessani, João Sousa, and Eduardo E.P. Alchieri. 2014. State MachineReplication for the Masses with BFT-SMART. In 44th Annual IEEE/IFIP In-

ternational Conference on Dependable Systems and Networks. IEEE, 355–362.https://doi.org/10.1109/DSN.2014.43

[8] Miguel Castro and Barbara Liskov. 2002. Practical Byzantine Fault Toleranceand Proactive Recovery. ACM Trans. Comput. Syst. 20, 4 (2002), 398–461. https://doi.org/10.1145/571637.571640

[9] Chia che Tsai, Donald E. Porter, and Mona Vij. 2017. Graphene-SGX: A PracticalLibrary OS for Unmodified Applications on SGX. In 2017 USENIXAnnual Technical

Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA, 645–658.[10] Raymond Cheng, Fan Zhang, Jernej Kos, Warren He, Nicholas Hynes, Noah

Johnson, Ari Juels, Andrew Miller, and Dawn Song. 2019. Ekiden: A Platformfor Confidentiality-Preserving, Trustworthy, and Performant Smart Contracts.In 2019 IEEE European Symposium on Security and Privacy (EuroS P). 185–200.https://doi.org/10.1109/EuroSP.2019.00023

[11] Byung-Gon Chun, Petros Maniatis, Scott Shenker, and John Kubiatowicz. 2007.Attested Append-OnlyMemory: Making Adversaries Stick to TheirWord. SIGOPSOper. Syst. Rev. 41, 6 (2007), 189–204. https://doi.org/10.1145/1323293.1294280

[12] Allen Clement, Flavio Junqueira, Aniket Kate, and Rodrigo Rodrigues. 2012.On the (Limited) Power of Non-Equivocation. In Proceedings of the 2012 ACM

Symposium on Principles of Distributed Computing. Association for ComputingMachinery, 301–308. https://doi.org/10.1145/2332432.2332490

[13] Allen Clement, Manos Kapritsos, Sangmin Lee, Yang Wang, Lorenzo Alvisi, MikeDahlin, and Taylor Riche. 2009. Upright Cluster Services. In Proceedings of the

Suyash Gupta, Sajjad Rahnama, Shubham Pandey, Natacha Crooks, and Mohammad Sadoghi

ACM SIGOPS 22nd Symposium on Operating Systems Principles. ACM, 277–290.https://doi.org/10.1145/1629575.1629602

[14] Allen Clement, EdmundWong, Lorenzo Alvisi, Mike Dahlin, and Mirco Marchetti.2009. Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults. InProceedings of the 6th USENIX Symposium on Networked Systems Design and

Implementation. USENIX, 153–168.[15] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell

Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of

the 1st ACM Symposium on Cloud Computing. ACM, 143–154. https://doi.org/10.1145/1807128.1807152

[16] Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. CryptologyePrint Archive, Report 2016/086. https://ia.cr/2016/086.

[17] Victor Costan, Ilia Lebedev, and Srinivas Devadas. 2016. Sanctum: MinimalHardware Extensions for Strong Software Isolation. In 25th USENIX Security

Symposium (USENIX Security 16). USENIX Association, Austin, TX, 857–874.[18] Jérémie Decouchant, David Kozhaya, Vincent Rahli, and Jiangshan Yu. 2022.

DAMYSUS: Streamlined BFT Consensus Leveraging Trusted Components. InProceedings of the Seventeenth European Conference on Computer Systems. Associ-ation for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/3492321.3519568

[19] Tien Tuan Anh Dinh, Ji Wang, Gang Chen, Rui Liu, Beng Chin Ooi, and Kian-LeeTan. 2017. BLOCKBENCH: A Framework for Analyzing Private Blockchains. InProceedings of the 2017 ACM International Conference on Management of Data.ACM, 1085–1100. https://doi.org/10.1145/3035918.3064033

[20] Shufan Fei, Zheng Yan, Wenxiu Ding, and Haomeng Xie. 2021. Security Vulnera-bilities of SGX and Countermeasures: A Survey. ACM Comput. Surv. 54, 6, Article126 (jul 2021). https://doi.org/10.1145/3456631

[21] Guy Golan Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas,Michael Reiter, Dragos-Adrian Seredinschi, Orr Tamir, and Alin Tomescu. 2019.SBFT: A Scalable and Decentralized Trust Infrastructure. In 49th Annual IEEE/IFIP

International Conference on Dependable Systems and Networks (DSN). IEEE, 568–580. https://doi.org/10.1109/DSN.2019.00063

[22] Trusted Computing Group. 2019. Trusted Platform Module Library. https://trustedcomputinggroup.org/resource/tpm-library-specification/

[23] Suyash Gupta, Jelle Hellings, Sajjad Rahnama, and Mohammad Sadoghi. 2021.Proof-of-Execution: Reaching Consensus through Fault-Tolerant Speculation. InProceedings of the 24th International Conference on Extending Database Technology.

[24] Suyash Gupta, Jelle Hellings, and Mohammad Sadoghi. 2021. Fault-Tolerant

Distributed Transactions on Blockchain. Morgan & Claypool Publishers. https://doi.org/10.2200/S01068ED1V01Y202012DTM065

[25] Suyash Gupta, Jelle Hellings, and Mohammad Sadoghi. 2021. RCC: ResilientConcurrent Consensus for High-Throughput Secure Transaction Processing. In37th IEEE International Conference on Data Engineering. IEEE. to appear.

[26] Suyash Gupta, Sajjad Rahnama, Jelle Hellings, and Mohammad Sadoghi. 2020.ResilientDB: Global Scale Resilient Blockchain Fabric. Proc. VLDB Endow. 13, 6(2020), 868–883. https://doi.org/10.14778/3380750.3380757

[27] Suyash Gupta, Sajjad Rahnama, and Mohammad Sadoghi. 2020. PermissionedBlockchain Through the Looking Glass: Architectural and ImplementationLessons Learned. In Proceedings of the 40th IEEE International Conference on

Distributed Computing Systems.[28] Manuel Huber, Julian Horsch, and Sascha Wessel. 2017. Protecting Suspended

Devices from Memory Attacks. In Proceedings of the 10th European Workshop on

Systems Security (EuroSec’17). Association for Computing Machinery, New York,NY, USA, Article 10, 6 pages. https://doi.org/10.1145/3065913.3065914

[29] Rüdiger Kapitza, Johannes Behl, Christian Cachin, Tobias Distler, Simon Kuhnle,Seyed VahidMohammadi,Wolfgang Schröder-Preikschat, and Klaus Stengel. 2012.CheapBFT: Resource-Efficient Byzantine Fault Tolerance. In Proceedings of the

7th ACM European Conference on Computer Systems. Association for ComputingMachinery, 295–308. https://doi.org/10.1145/2168836.2168866

[30] Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and EdmundWong. 2009. Zyzzyva: Speculative Byzantine Fault Tolerance. ACMTrans. Comput.

Syst. 27, 4 (2009), 7:1–7:39. https://doi.org/10.1145/1658357.1658358[31] Leslie Lamport. 2001. Paxos Made Simple. ACM SIGACT News 32, 4 (2001), 51–58.

https://doi.org/10.1145/568425.568433 Distributed Computing Column 5.[32] Dayeol Lee, David Kohlbrenner, Shweta Shinde, Krste Asanović, and Dawn

Song. 2020. Keystone: An Open Framework for Architecting Trusted ExecutionEnvironments. In Proceedings of the Fifteenth European Conference on Computer

Systems (Heraklion, Greece) (EuroSys ’20). Association for Computing Machinery,New York, NY, USA, Article 38. https://doi.org/10.1145/3342195.3387532

[33] Dave Levin, John R. Douceur, Jacob R. Lorch, and ThomasMoscibroda. 2009. TrInc:Small Trusted Hardware for Large Distributed Systems. In 6th USENIX Symposium

on Networked Systems Design and Implementation. USENIX Association.[34] Joshua Lind, Oded Naor, Ittay Eyal, Florian Kelbert, Emin Gün Sirer, and Pe-

ter R. Pietzuch. 2019. Teechain: a secure payment network with asynchronousblockchain access. In Proceedings of the 27th ACM Symposium on Operating Sys-

tems Principles, SOSP 2019, Huntsville, ON, Canada, October 27-30, 2019. ACM,63–79. https://doi.org/10.1145/3341301.3359627

[35] Joshua Lind, Christian Priebe, Divya Muthukumaran, Dan O’Keeffe, Pierre-LouisAublin, Florian Kelbert, Tobias Reiher, David Goltzsche, David M. Eyers, RüdigerKapitza, Christof Fetzer, and Peter R. Pietzuch. 2017. Glamdring: Automatic Ap-plication Partitioning for Intel SGX. In 2017 USENIX Annual Technical Conference,

USENIX ATC 2017, Santa Clara, CA, USA, July 12-14, 2017. USENIX Association,285–298.

[36] André Martin, Cong Lian, Franz Gregor, Robert Krahn, Valerio Schiavoni, PascalFelber, and Christof Fetzer. 2021. ADAM-CS: Advanced Asynchronous Mono-tonic Counter Service. In 2021 51st Annual IEEE/IFIP International Conference

on Dependable Systems and Networks (DSN). 426–437. https://doi.org/10.1109/DSN48987.2021.00053

[37] André Martin, Cong Lian, Franz Gregor, Robert Krahn, Valerio Schiavoni, PascalFelber, and Christof Fetzer. 2021. ADAM-CS: Advanced AsynchronousMonotonicCounter Service. In 51st Annual IEEE/IFIP International Conference on Dependable

Systems and Networks. IEEE, 426–437. https://doi.org/10.1109/DSN48987.2021.00053

[38] Sinisa Matetic, Mansoor Ahmed, Kari Kostiainen, Aritra Dhar, David Sommer,Arthur Gervais, Ari Juels, and Srdjan Capkun. 2017. ROTE: Rollback Protectionfor Trusted Execution. In Proceedings of the 26th USENIX Conference on Security

Symposium (SEC’17). USENIX Association, USA, 1289–1306.[39] Satoshi Nakamoto. 2009. Bitcoin: A Peer-to-Peer Electronic Cash System. https:

//bitcoin.org/bitcoin.pdf[40] Faisal Nawab. 2021. WedgeChain: A Trusted Edge-Cloud Store With Asynchro-

nous (Lazy) Trust. In 37th IEEE International Conference on Data Engineering,

ICDE. IEEE, 408–419. https://doi.org/10.1109/ICDE51399.2021.00042[41] Ray Neiheiser, Miguel Matos, and Luís Rodrigues. 2021. Kauri: Scalable BFT Con-

sensus with Pipelined Tree-Based Dissemination and Aggregation. In Proceedings

of the ACM SIGOPS 28th Symposium on Operating Systems Principles (VirtualEvent, Germany) (SOSP ’21). Association for Computing Machinery, New York,NY, USA, 35–48. https://doi.org/10.1145/3477132.3483584

[42] Diego Ongaro and John Ousterhout. 2014. In Search of an UnderstandableConsensus Algorithm. In Proceedings of the 2014 USENIX Conference on USENIX

Annual Technical Conference. USENIX, 305–320.[43] Bryan Parno, Jacob R. Lorch, John R. Douceur, JamesW. Mickens, and JonathanM.

McCune. 2011. Memoir: Practical State Continuity for Protected Modules. In32nd IEEE Symposium on Security and Privacy, S&P 2011, 22-25 May 2011, Berkeley,

California, USA. IEEE Computer Society, 379–394. https://doi.org/10.1109/SP.2011.38

[44] Christian Priebe, Divya Muthukumaran, Joshua Lind, Huanzhou Zhu, Shujie Cui,Vasily A. Sartakov, and Peter R. Pietzuch. 2019. SGX-LKL: Securing the Host OSInterface for Trusted Execution. CoRR abs/1908.11143 (2019). arXiv:1908.11143

[45] Mark Russinovich, Edward Ashton, Christine Avanessians, Miguel Castro,Amaury Chamayou, Sylvan Clebsch, Manuel Costa, Cédric Fournet, MatthewKerner, Sid Krishna, Julien Maffre, Thomas Moscibroda, Kartik Nayak, OlyaOhrimenko, Felix Schuster, Roy Schwartz, Alex Shamis, Olga Vrousgou,and Christoph M. Wintersteiger. 2019. CCF: A Framework for Building

Confidential Verifiable Replicated Services. Technical Report MSR-TR-2019-16. Microsoft. https://www.microsoft.com/en-us/research/publication/ccf-a-framework-for-building-confidential-verifiable-replicated-services/

[46] J.H. Saltzer and M.D. Schroeder. 1975. The protection of information in computersystems. Proc. IEEE 63, 9 (1975), 1278–1308. https://doi.org/10.1109/PROC.1975.9939

[47] Vasily A. Sartakov, Daniel O’Keeffe, David M. Eyers, Lluís Vilanova, and Peter R.Pietzuch. 2021. Spons & Shields: practical isolation for trusted execution. In VEE

’21: 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution

Environments, Virtual USA, April 16, 2021. ACM, 186–200. https://doi.org/10.1145/3453933.3454024

[48] Alex Shamis, Peter Pietzuch, Burcu Canakci, Miguel Castro, Cedric Fournet,Edward Ashton, Amaury Chamayou, Sylvan Clebsch, Antoine Delignat-Lavaud,Matthew Kerner, Julien Maffre, Olga Vrousgou, Christoph M. Wintersteiger,Manuel Costa, and Mark Russinovich. 2022. IA-CCF: Individual Accountabilityfor Permissioned Ledgers. In 19th USENIX Symposium on Networked Systems

Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 467–491.

[49] Chrysoula Stathakopoulou, Matej Pavlovic, and Marko Vukolić. 2022. StateMachine Replication Scalability Made Simple. In Proceedings of the Seventeenth

European Conference on Computer Systems. Association for ComputingMachinery,New York, NY, USA, 17–33. https://doi.org/10.1145/3492321.3519579

[50] Florian Suri-Payer, Matthew Burke, Zheng Wang, Yunhao Zhang, Lorenzo Alvisi,and Natacha Crooks. 2021. Basil: Breaking up BFT with ACID (Transactions). InProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles

(SOSP ’21). Association for Computing Machinery, New York, NY, USA, 1–17.https://doi.org/10.1145/3477132.3483552

[51] Jörg Thalheim, Harshavardhan Unnibhavi, Christian Priebe, Pramod Bhatotia,and Peter R. Pietzuch. 2021. rkt-io: a direct I/O stack for shielded execution. InEuroSys ’21: Sixteenth European Conference on Computer Systems, Online Event,

United Kingdom, April 26-28, 2021. ACM, 490–506. https://doi.org/10.1145/3447786.3456255

Dissecting BFT Consensus: In Trusted Components we Trust!

[52] Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, Lau CheukLung, and Paulo Verissimo. 2013. Efficient Byzantine Fault-Tolerance. IEEE Trans.

Comput. 62, 1 (2013), 16–30. https://doi.org/10.1109/TC.2011.221[53] WenbinWang, Chaoshu Yang, Runyu Zhang, ShunNie, Xianzhang Chen, and Duo

Liu. 2020. Themis: Malicious Wear Detection and Defense for Persistent MemoryFile Systems. In 2020 IEEE 26th International Conference on Parallel and DistributedSystems (ICPADS). 140–147. https://doi.org/10.1109/ICPADS51040.2020.00028

[54] Gavin Wood. 2015. Ethereum: A secure decentralised generalised transactionledger. http://gavwood.com/paper.pdf

[55] Sravya Yandamuri, Ittai Abraham, Kartik Nayak, and Michael Reiter. 2021. BriefAnnouncement: Communication-Efficient BFT Using Small Trusted Hardware toTolerate Minority Corruption. In 35th International Symposium on Distributed

Computing (DISC 2021) (Leibniz International Proceedings in Informatics (LIPIcs),

Vol. 209). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany,62:1–62:4. https://doi.org/10.4230/LIPIcs.DISC.2021.62

[56] Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan Gueta, and Ittai Abra-ham. 2019. HotStuff: BFT Consensus with Linearity and Responsiveness. InProceedings of the ACM Symposium on Principles of Distributed Computing. ACM,347–356. https://doi.org/10.1145/3293611.3331591

A APPENDIX

We first prove that in FlexiTrust protocols, no two honest replicaswill execute two different requests at the same sequence number.

Theorem 2. Let r𝑖 , 𝑖 ∈ {1, 2}, be two honest replicas that executed⟨𝑇𝑖 ⟩𝑐𝑖 as the 𝑘-th transaction of a given view 𝑣 . If n > 3f , then⟨𝑇1⟩𝑐1 = ⟨𝑇2⟩𝑐2 .

Proof. Replica r𝑖 only executed ⟨𝑇𝑖 ⟩𝑐𝑖 after r𝑖 received a well-formed Preprepare message (Preprepare(⟨𝑇 ⟩𝑐 ,Δ, 𝑘, 𝑣, 𝜎)) fromthe primary p. Thismessage includes an attestation𝜎 = ⟨Attest(𝑞, 𝑘,Δ)⟩tpfrom tp, which we assume cannot be compromised. Let 𝑆𝑖 be thereplicas that received Prepreparemessage for ⟨𝑇𝑖 ⟩𝑐𝑖 . Let𝑋𝑖 = 𝑆𝑖\Fbe the honest replicas in 𝑆𝑖 . As |𝑆𝑖 | = 2f + 1 and |F | = f , we have|𝑋𝑖 | = 2f + 1 − f . The honest replicas in 𝑇𝑖 will only execute the𝑘-th transaction in view 𝑣 if it has a attestation from the trustedcomponent at the primary. If ⟨𝑇1⟩𝑐1 ≠ ⟨𝑇2⟩𝑐2 , then 𝑋1 and 𝑋2 mustnot overlap as the trusted component will never assign them thesame sequence number. Hence, |𝑋1 ∪ 𝑋2 | ≥ 2(2f + 1 − f). Thissimplifies to |𝑋1 ∪ 𝑋2 | ≥ 2f + 2, which contradicts n > 3f . Hence,we conclude ⟨𝑇1⟩𝑐1 = ⟨𝑇2⟩𝑐2 . □

Next, we show that Flexi-BFT guarantees a safe consensus.

Theorem 3. In a system S = {R, C} where |R | = n = 3f + 1,Flexi-BFT protocol guarantees a safe consensus.

Proof. If the primary p is honest, then from Theorem 2, we canconclude that no two replicas will execute different transactionsfor the same sequence number. This implies that all the honestreplicas will execute the same transaction per sequence number. Ifa transaction is executed by at least f + 1 honest replicas, then itwill persist across views as in any view-change quorum of 2f + 1replicas, there will be one honest replica that has executed thisrequest and has a valid Preprepare message and 2f + 1 Preparemessages corresponding to this request.

If the primary p is byzantine, it can only prevent broadcasting thePreprepare messages to a subset of replicas. p cannot equivocateas it does not assign sequence numbers. For each transaction 𝑇 , pneeds to access its tp, which returns a sequence number 𝑘 and anattestation that𝑘 binds𝑇 . Further, Theorem 2 proves that for a givenview, no two honest replicas will execute different transactionsfor the same sequence number. So, a byzantine p can send thePreprepare for 𝑇 : (i) to at least 2f + 1 replicas, or (ii) to less than

2f + 1 replicas. In either cases, any replica r that receives 𝑇 willsend the Prepare message. If r receives 2f + 1 Prepare messages,it will execute𝑇 and reply to the client. Any remaining replica thatdid not receive 𝑇 will eventually timeout waiting for a request andtrigger a ViewChange. If at least f + 1 replicas timeout, then aViewChange will take place.

If𝑇 was prepared by at least f+1 honest replicas, then this requestwill be part of the subsequent view. Otherwise, the subsequentview may or may not include 𝑇 . But this should not be an issuebecause such a transaction was not executed by any honest replica;no replica would have received 2f + 1 Prepare messages. Hence,system is safe even if this transaction is forgotten.

Each new view 𝑣 +1 is led by a replica with identifier 𝑖 , where 𝑖 =(𝑣 + 1) mod n. The new primary waits for ViewChange messagesfrom 2f + 1 replicas, uses these messages to create a NewViewmessage, and forwards these messages to all the replicas. ThisNewView message includes a list of requests for each sequencenumber present in the ViewChange message. The new primaryneeds to set its counter to the lowest sequence number of this list(may need to create a new counter). Post sending the NewViewmessage, the new primary re-proposes the Preprepare messagefor each request in the NewView. Each replica on receiving theNewView message can verify its contents. □

Next, we show that Flexi-ZZ guarantees a safe consensus.

Theorem 4. In a system S = {R, C} where |R | = n = 3f + 1,Flexi-ZZ protocol guarantees a safe consensus.

Proof. If the primary p is honest, then from Theorem 2, we canconclude that no two replicas will execute different transactionsfor the same sequence number. This implies that all the honestreplicas will execute the same transaction per sequence number. Ifa transaction is executed by at least f + 1 honest replicas, then itwill persist across views as in any view-change quorum of 2f + 1replicas, there will be one honest replica that has executed thisrequest and has a valid Preprepare message.

If the primary p is byzantine, it can only prevent broadcasting thePreprepare messages to a subset of replicas. p cannot equivocateas it does not assign sequence numbers. For each transaction 𝑇 , pneeds to access its tp, which returns a sequence number 𝑘 and anattestation that𝑘 binds𝑇 . Further, Theorem 2 proves that for a givenview, no two honest replicas will execute different transactionsfor the same sequence number. So, a byzantine p can send thePreprepare for 𝑇 : (i) to at least 2f + 1 replicas, or (ii) to less than2f + 1 replicas. In either cases, any replica that receives 𝑇 willexecute it and reply to the client, while the remaining replicas willeventually timeout waiting for a request and trigger aViewChange.If at least f +1 replicas timeout, then a ViewChangewill take place.If𝑇 was executed by at least f + 1 honest replicas, then this requestwill be part of the subsequent view. Otherwise, the subsequent viewmay or may not include𝑇 . In such a case, any replica that executed𝑇 would be required to rollback its state. However, for any requestif the client receives 2f + 1 responses, it will persist across viewsbecause at least f + 1 honest replicas have executed that request.

Each new view 𝑣 +1 is led by a replica with identifier 𝑖 , where 𝑖 =(𝑣 + 1) mod n. The new primary waits for ViewChange messagesfrom 2f + 1 replicas, uses these messages to create a NewView

Suyash Gupta, Sajjad Rahnama, Shubham Pandey, Natacha Crooks, and Mohammad Sadoghi

message, and forwards these messages to all the replicas. ThisNewView message includes a list of requests for each sequencenumber present in the ViewChange message. The new primaryneeds to set its counter to the lowest sequence number of this list

(may need to create a new counter). Post sending the NewViewmessage, the new primary re-proposes the Preprepare messagefor each request in the NewView. Each replica on receiving theNewView message can verify its contents. □