Masking the energy behaviour of encryption algorithms
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Masking the energy behaviour of encryption algorithms
Masking the Energy Behavior of Encryption Algor ithms
H. Saputra, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, R. Brooks†
Computer Science and Engineering, Applied Research Lab†
The Pennsylvania State University
Emails: { saputra, vijay, kandemir, mji} @cse.psu.edu, [email protected]†
Abstract
Smart cards are vulnerable to both invasive and non-invasive attacks. Specifically, non-invasive
attacks using power and timing measurements to extract the cryptographic key has drawn a lot of negative
publicity for smart card usage. The power measurement techniques rely on the data-dependent energy
behavior of the underlying system. Further, power analysis can be used to identify the specific portions of
the program being executed to induce timing glitches that may in turn help to bypass key checking. Thus, it
is important to mask the energy consumption when executing the encryption algorithms.
In this work, we augment the instruction set architecture of a simple five-stage pipelined smart card
processor with secure instructions to mask the energy differences due to key-related data-dependent
computations in DES and Rijndael encryptions. The secure versions operate on the normal and
complementary versions of the operands simultaneously to mask the energy variations due to value
dependent operations. However, this incurs the penalty of increased overall energy consumption in the
data-path components. Consequently, we employ secure versions of instructions only for critical
operations; that is we use secure instructions selectively, as directed by an optimizing compiler. Using a
cycle-accurate energy simulator, we demonstrate the effectiveness of this enhancement. Our approach
achieves the energy masking of critical operations consuming 83% less energy as compared to existing
approaches employing dual rail circuits.
NOTE TO REFEREES: This submission is an extended version of a DATE’03 paper, entitled “Masking
the Energy Behavior of DES Encryption” [21]. This submission extends the DATE’03 paper by including
energy masking of Rijndael algorithm and that of randomization
1. INTRODUCTION With increased usage of smart cards, the financial incentive for security attacks becomes attractive. For
example, the smart card usage in North America surged by 37% in 2000, particularly in the financial
segment where security is a prime issue. There are various classes of security attacks that can be broadly
classified as: microprobing, software attacks, eavesdropping and fault generation [15]. Microprobing is an
invasive technique that involves physical manipulation of the smart card circuit after opening the package.
Software attacks focus on protocol or algorithm weaknesses, while eavesdropping techniques hack the
secret keys and information by monitoring the power consumption, electromagnetic radiation and
execution time. The fault generation techniques induce an intentional malfunction of the circuit by using
techniques such as varying the supply voltage, inducing clock glitches, exposing the circuit to ionizing
radiation, etc. Our focus in this work is on addressing the information leak due to eavesdropping power
profile.
Power analysis attack is based on analyzing the power consumption of an operation. The main
rationale behind this kind of an attack is that the power consumption of an operation depends on the inputs
(in the case of cryptography, the inputs are plaintext and secret key). The differences in values of the
operands being operated on result in different switching activities in the memory, buses, datapath units
(adders, multipliers, logical units) and pipeline registers of the smart card processor. Among these
components, the processor datapath and buses exhibit more data-dependent energy variation as compared
to memory components [16].
There are different degrees of sophistication involved in such power analysis based attacks. Simple
Power Analysis (SPA) [7] uses only a single power consumption trace for an operation. From this power
trace, an attacker can identify the operations being performed (such as whether a branch at point p is taken
or not or whether an exponentiation operation is performed). Using this power information and by knowing
the underlying algorithm being implemented, such information can reveal the secret key. For example,
when a branch is taken based on a particular bit of a secret key being zero, the attacker can identify this bit
by monitoring the power consumption difference between a taken and not taken branch. Protecting against
this type of simple attack can be achieved fairly easily by restructuring the code. For example, a
restructured algorithm is provided in [3] to eliminate branch conditions that were initially revealing the
secret key information. Also, techniques that randomly introduce noise into the power measurement can
mislead simple power analysis. An example of such technique involves adding dummy modules and
activating them at random time intervals. These modules will consume additional power skewing the
original power profile. However, such techniques only provide protection from straightforward hacking
techniques. Higher-order power analysis techniques can be used to circumvent these protection
mechanisms.
Differential power analysis (DPA) is currently the most popular higher-order power analysis. This
scheme utilizes power profiles gathered from several runs and relies on the data-dependent power
consumption variation to break the key [7]. In [5], Goubin et al. show how the secret key is guessed by
using 1000 sample inputs and their corresponding 1000 power consumption traces. Then, a mean of all
these power consumption traces represented as M is obtained. Next, the hacker guesses a particular key and
based on the input determines a theoretical value for one of the intermediate bits (b) generated by the
program. Then, the outcome of this bit is used to separate the 1000 inputs into two groups (G1 and G2)
based on whether b=0 or b=1. If the mean of the power profiles in Group G1 is significantly different from
that of M, this indicates that the guess was correct. This difference is a manifestation of the consequent
downstream computational power differences that used the bit b. As evident from the above discussion,
random noises in power measurements can be filtered through the averaging process using a large number
of samples. However, the use of random noises can increase the number of samples to an infeasible number
to be of practical concern.
In this work, we propose to counter DPA by modifying the underlying circuit to make its power
consumption independent of the operand data using dual rail logic. However, the naïve use of such logic to
mask the data dependent energy behavior to protect the secret keys has a significant energy penalty. Our
goal is to limit the use of the additional components required for energy masking for selective operations.
The key to identify the operations that require energy masking is an optimizing compiler. This compiler
allows users to annotate certain critical variables. Whenever these critical variables or values derived from
them are operated on, it is important to mask any data dependent energy behavior. This is because the
energy profile can be used to reveal sensitive data (e.g., key values in encryption algorithms) without any
energy masking. The compiler uses the secure instructions for all operations that directly operate on the
annotated variables. Further, it uses a technique called forward slicing to identify other critical operations
that depend on the annotated ones. Forward slicing is required to prevent indirect information leak from
operations that do not operate on the sensitive variables directly.
We evaluate the proposed scheme focusing on the DES and Rijndael algorithms using a detailed cycle-
accurate power analysis with the help of an energy estimation framework. Note that our approach is
general and can be extended to other algorithms that need protection against current measurements based
breaks. We measure the energy consumed in every cycle in picoJoules throughout the paper. The use of the
simulator provides a far greater control of the granularity of information than would be practically possible
for a hacker using physical power measurement tools. Thus, our approach is conservative and factors in
possible improvements in measurement fidelity. Our experiments show that the proposed approach based
on selective use of secure instructions achieves the energy masking of critical operations consuming 83%
less energy as compared to existing approaches that apply masking to all operations due to the lack of any
compiler analysis.
The remainder of this paper is organized as follows. In Section 2, we discuss related work on this
topic. In Section 3, we revise the DES algorithm. In Section 4, we present our masking method. This
section gives explanation about our secure instructions and our modified architecture. We discuss the
Rijndael algorithm and the associated experimental results in Section 5. Another approach to masking
energy behavior, which uses randomization, is explained in Section 6. Finally, conclusions and a brief
outline of future work are given in Section 7.
2. RELATED WORK
Various researchers have investigated the potential for both invasive and non-invasive attacks against
smart cards. An overview of these techniques is presented in [4, 15]. In particular, the retrieval of secret
information from a smart card through its leakage called side-channel analysis poses an important class of
attack. Analyzing the power profile of an encryption process is one of the popular side-channel attacks.
Kocher et al. [7] provide a detailed description of the simple power analysis (SPA) and differential power
analysis (DPA) techniques. The difference between these two attacks is that DPA is more sophisticated and
involves statistical analysis using a larger sample set.
There have been prior attempts to address the SPA and DPA attacks [2,3,5,6,7,12]. These counter
measures can be classified into three types as performed in [5]. First, random timing shifts and noises can
be added such that computed means for power consumption do not correspond to the same instruction.
However, the difficulty in the protection process is to ensure such random techniques are not vulnerable to
statistical treatment using large samples. While complete protection is difficult to achieve using such a
counter measure, it could make it infeasible for an attacker to break the key. A new approach for
introducing noise is presented in [20]. The authors use two different types of modules and select randomly
at any cycle one of them such that the energy consumption trace becomes different from the trace we have
if we only used a single type of module. The second counter measure technique is to modify the underlying
software implementation or algorithm [3, 4, 5]. For instance, the use of non-linear transformations of S-box
operations is proposed in [5] to avoid some DPA attacks. However, it is observed in [4] that software
counter measures may be difficult to design.
The third form of countermeasure involves replacing some of the critical instructions such that their
energy consumption behavior does not leak any information. Our approach falls into this category. Dual-
rail logic has been identified as one of the most promising ways to provide protection to SPA and DPA [4].
The use of dual-rail encoding and self-timed circuits is also proposed in [12]. The use of self-timed circuits
can alleviate problems arising from clock glitches induced in synchronous circuits while the dual-rail logic
masks the power consumption differences due to bit variations. Our contribution is the actual modification
of an embedded processor to implement this technique and the addition of secure instructions to its
instruction set. However, the use of dual-rail logic can increase overall power consumption by almost two
times as observed in our experiments. In fact, power is an important constraint for smart card markets such
as the GSM industry and specific constraints on maximum power are imposed by the GSM specification
for different supply voltages [4]. Tiri et al [19] recently introduced new type of logic that consumes the
same energy independent of the specific input values applied. The logics charge the same amount of
capacitance at any time which is the same amount as the total of the balanced load capacitance and the sum
of all internal load capacitances.
3. DATA ENCRTYPTION STANDARD
Data Encryption Standard (DES) is one of many symmetric cryptography algorithms that are widely
used. DES [1] uses 64 binary bits for the key, of which 56 bits are used in the algorithm while the other 8
bits are used for error detection. Similar to other cryptography algorithms, the security of the data depends
on the security provided by the key used to encrypt or decrypt the data. Although the DES algorithm is
public, one will not be able to decrypt the cipher unless they know the secret key used to encrypt that
cipher.
The plaintext is permuted first before it goes to the encryption process. The core of the DES
encryption process consists of 16 identical rounds, each of which has its own sub secret key, called Kn
(n=1,2, ..16). Each Kn is 48 bits produced from the original secret key using key-permutation and shift
operations. Each round consists of 8 S-Box operations which use the 48-bit input derived from the
permutated input and sub secret key Kn (see Figure 1). Each S-box operation takes 6 bits from the 48-bit
input that is divided into 8 blocks. These 6-bits are used to index a table. This table look up operation
consequently generates 4 bits (for each S-Box). These 32 output bits (4 x 8 S-Boxes) are then used as an
input combination for the next round.
Figure 1. S-Box Operations (f(R,K)) [1]
The complete encryption process of DES is shown in Figure 2 (left side). Here, (+) denotes bit-by-bit
addition modulo 2. From this algorithm, we can identify the operations that can reveal the secret key.
These operations are key permutation, left-side and right-side assignment operations and key generation.
Next, we need to identify the underlying instructions used to implement operations and consider their
secure versions.
(a) Or iginal DES operations (b) Modified DES operations
Figure 2. Modified DES Algor ithm
4. ENERGY MASKING FOR DES
4.1 Overview
Our energy masking approach is based on eliminating the input dependencies of an operation. Our
approach is focused on four types of operations that are critical in the DES encryption: assignment
operation, bit-by-bit addition modulo two (XOR) operation, shift operation and indexing operation. In our
approach, we do not mask all the operations, but only the operations that use the secret key and those
operations that use the data generated from prior secure operations. The compiler analyzes the code and
identifies how these variables are used within the code. Then, for the operators that work on these
variables, the compiler employs secure versions of the corresponding instructions. It should be emphasized
that it is not sufficient to protect only the sensitive variables annotated by the programmer. This is because
the variables whose values are determined based on the values of the protected variables can also be
exploited to leak information. Consequently, such variables also need to be protected. We achieve this
using a technique called forward slicing [11]. In forward slicing, given a set of variables and/or instructions
(called seeds), the compiler determines all the variables/instructions whose values depend on the seeds.
The complexity of this process is bounded by the number of edges of the control flow graph of the code
Data Initial Permutation (L0,R0) = PermuteIP(Data) Key Permutation (C0,D0) = PermuteK1(Key) = denotes insecure assignment
Data Initial Permutation (L0,R0) = PermuteIP(Data) Key Permutation (C0,D0) � PermuteK1(Key) � denotes secure assignment
M th Rounds Left Side Operation Lm = Rm-1 M th Key Generation Cm = Rotate(Cm-1,n) Dm = Rotate(Dm-1,n) Km =PermuteK2(Cm,Dm) Right Side Operation E(R )= PermuteE(Rm-1) f(Rm-1,K) = S(E(R)(+) Km) Rm = Lm-1 (+) f(Rm-1,K)
M th Rounds Left Side Operation Lm � Rm-1 M th Key Generation Cm � Rotate(Cm-1, n) Dm � Rotate(Dm-1,n) Km � PermuteK2(Cm,Dm) Right Side Operation E(R)� PermuteE(Rm-1) f(Rm-1,K) � S(E(R) <+> Km) Rm � Lm-1 <+> f(Rm-1,K)
Output Inverse Permutation Output=PermuteIP-1(R16,L16)
Output Inverse Permutation Output=PermuteIP-1(R16,L16)
being analyzed. After all the variables whose values are affected by the seeds are determined, the compiler
uses secure instructions to protect them.
Figure 2 shows how we modified the DES operations. Figure 2(a) shows the original DES operations.
The first step is initial permutation of the plaintext. This operation does not use any secret key and hence
does not require being secure.
The next operation is the key permutation. This operation obviously needs to be secure. Figure 2(b)
shows how we modify this operation. In this figure, the symbol “=” corresponds to the original assignment
(i.e., insecure assignment), and the symbol “�” indicates that the assignment is secure.
The next step contains the operations within each round. Since some of these operations require the
secret key, and the operations are repeated in every round using the data generated from the previous
round, we need to secure all operations inside this block. Note that the modified left side operation uses a
secure assignment operation, although it does not operate on the secret key directly. This is because it uses
the data generated from the previous round (for �2nd round) that uses the secret key. In the right side
operation, all the instructions need to be secure. Each round uses four types of secure operations: they are
secure assignment, secure shift, secure bit-by-bit addition modulo two and secure indexing. Note that the S
symbol in the figure represents the S-Box operation.
The last operation is the output inverse permutation. This operation does not need any secure
instruction although it uses data generated from secure instructions as it reveals only the information
already available from the output cipher. The following section explains how our secure instructions are
implemented.
4.2 Implementation of Secure Instructions
Our target 32-bit embedded processor has five-pipeline stages (fetch, decode, execute, memory access
and write back) and implements the integer instructions from the Simplescalar instruction set architecture
[16]. Its ISA is representative of current embedded 32-bit RISC cores used in smart cards such as the
ARM7-TDMI RISC core. We augment our target instruction set architecture with secure versions of select
instructions. To support these secure operations, the hardware should be modified as explained below.
Figure 3. Secure Load Architecture (dotted por tion is the augmented par t)
First, we provide an overview of the underlying reasons for the differences in the power consumption
because of data dependencies when executing these instructions. An assignment operation typically
involves loading a variable and storing it into another variable. We will consider the parts of the load
operation that are of interest. All stages of our pipeline (see Figure 3) till the memory access stage are
independent of the loaded data (note that revealing the address of data is not considered as a problem). The
memory access itself is not sensitive to the data being read due to the differential nature of the memory
reads. However, the output data bus switching depends on the data being transmitted. For example, let us
consider the different scenarios for the 1st bit (d0) of the 32-bit data read from the cache. If the values of d0
in two successive cycles are 0 and 1, it consumes more power than the case when the values are 0 and 0 in
these two cycles. Specifically, for an internal wire of 1pF and a supply voltage of 2.5V, the first case
consumes 6.25pJ more energy than the second case. The output from the memory access stage is fed to the
pipeline register before being forwarded for storing the data in the register file. Thus, based on whether a
bit value of one or zero is stored in the pipeline register bits, a different amount of energy is consumed.
Finally, the energy consumed in writing to a register is independent of the data as the register file can be
considered as another memory array.
The secure version of the load operation will need to mask all these energy differences due to bit
dependences. This is achieved by the following modifications to the architecture. The buses carrying the
data from a secure load are provided in both their normal and complementary forms. Thus, instead of a 32-
bit bus, we use a 64-bit bus. Thus, the number of 1s and 0s transmitted in the bus will both be 32. However,
this is not sufficient for masking the energy differences that depend on the number of transitions across the
Reg File
ALU
Mem
ory
Offset Fetch
Address Calculation
base
offset mem
memdata
!memdata Memory Fetch
Load into Reg
Dummy Capacitance Load
IF ID EXE MEM WB
bus. But this modification along with a pre-charged bus can mask this difference. All the 64 bus lines are
pre-charged to a value of one in the first phase of the clock. In the next evaluating phase, the bus settles to
its actual value. Exactly, 32 of the bus lines will discharge to a value of zero. In subsequent cycles, energy
is consumed only in pre-charging 32 lines independent of the input activity. The next modification involves
propagating the normal and complementary values until the write back stage. The complementary values
are terminated using a dummy capacitive load. The required enhancements to the underlying processor
architecture are illustrated in Figure 3. Similarly, a secure version of the store operation involves passing
along both the normal and complementary forms of the data read from the register file in the decode stage
to the memory access stage.
Figure 4. Code level representation of the left side operation
A secure assignment uses a combination of both the secure load and the secure store to mask the
energy behavior of the sensitive data. Figure 4 shows a specific elaboration of the use of the secure
assignment in assembly code for the assignment performed during the “ left side operation” . The high-level
assignment statement leads to a sequence of assembly instructions. The critical operations (the load and
………. / / Lef t Si de Oper at i on f or ( i =0; i <32; i ++) newL[ i ] = ol dR[ i ] ……. .
. . . . . . . . . . . . . . . . . $L12: . . . . . . . . . . . . . . . $L15: l w $2, i . . . . . . . . . . . . . . . l a $4, newL addu $3, $2, $4 move $2, $3 l w $3, i move $4, $3 s l l $3, $4, 2 l a $4, ol dR addu $3, $3, $4 move $4, $3 l w $3, 0( $4) sw $3, 0( $2) $L14: l w $3, i addu $2, $3, 1 move $3, $2 sw $3, i j $L12 $L13: . . . . . . . . . . . . . . . . . (a) Or iginal Assembly Code
. . . . . . . . . . . . . . . . . $L12: . . . . . . . . . . . . . . . $L15: l w $2, i . . . . . . . . . . . . . . . l a $4, newL addu $3, $2, $4 move $2, $3 l w $3, i move $4, $3 s l l $3, $4, 2 l a $4, ol dR addu $3, $3, $4 move $4, $3 sl w $3, 0( $4) ssw $3, 0( $2) $L14: l w $3, i addu $2, $3, 1 move $3, $2 sw $3, i j $L12 $L13: . . . . . . . . . . . . . . . . . (b) Modified Assembly Code
store instructions highlighted) whose energy behavior needs to be made data independent are then
converted to secure versions in our implementation by the optimizing compiler.
The secure 32-bit XOR instruction is implemented using complementary pre-charged circuit (see
Figure 5) that will ensure that for every XOR bit that discharges in the required circuit, the complementary
circuit will not discharge and vice-versa. In the first clock phase (when v =0), all (64 = 32 original + 32
complementary) the output nodes of the XOR circuit are pre-charged to one. In the next phase (when v=1),
half of them will discharge and the other half of them will remain at one. In subsequent cycles that use the
XOR, the energy is consumed only for charging 32 output nodes immaterial of the data activity.
During the S-Box operation, a 6-bit value is used to index a table. This operation is performed by a
load operation with the 6-bit value serving as the offset in our underlying architecture. Note that our
current secure load operation does not mask the energy difference due to differences in the offset. As these
6-bits are derived from the key, it is also important to hide the value of this offset. When the 6-bit value is
added as an offset to the base address of the table, the addition operation will consume an energy based on
the 6-bit value. In order to avoid this, we align the base address of the table such that the 6-bit value serves
as the least significant bits of the lookup and the most significant bits are determined at compile time.
Further, the inverted value of this 6-bit index is propagated to mask the energy consumption. Thus, the load
operations used for indexing are replaced by the secure indexing that generates the memory address using
our secure version.
Figure 5. XOR circuit and its complement. v is the clock. A and B are the inputs to the XOR function
In order to utilize these augmented architectural features, the compiler tags selected operations as
secure. Secure instructions can be implemented using either the unassigned opcodes (bits in the instruction
identifying the operation) in the processor architecture or by augmenting the original opcodes with an
additional secure bit per operand. In our implementation, we resort to the second option to minimize the
impact on the decoding logic. Whenever a secure version of the instruction is identified both the normal
and complementary versions of the appropriate segments of the processor become active. For example, for
the secure XOR operations, the data values (both source data and result data) are present in normal and
complementary forms in the internal data buses. Further, the required and complementary versions of the
circuit operate together. Since the additional parts consume extra power, the clock to the complementary
versions is gated to reduce energy consumption. The details of the gating (note that the complementary
version of the circuit is provided with a clock v gated with secure signal – secure v - for the evaluation
phase) for the XOR unit implementation are shown in Figure 5. Thus, as opposed to energy consumption of
0.06pJ in the secure mode, the XOR unit consumes only 0.03pJ in the normal mode. Additional savings in
energy also accrue during the execution of normal versions due to gating of the additional buses and the
pipeline registers.
4.3 Evaluation
To evaluate the effectiveness of our approach, we have implemented the DES algorithm in software
and captured the energy consumption in each cycle using a customized version of the publicly available
SimplePower [16], a cycle-accurate energy simulator. We focus only on the processor and buses in this
work, as memory power consumption is largely data-independent. The simulator uses validated transition-
sensitive energy models for both the buses and functional units obtained through detailed circuit simulation
and is within 9% of actual values [16]. It is able to accurately capture the differences in energy
consumption due to data transitions. The flexibility of working with the simulator provides us the ability to
monitor the energy consumed in every cycle (along with details of actual instructions executed) and also
helps us in quickly identifying the benefits or (otherwise) in modifying the underlying processor
architecture. Current measurement based approaches would be limited by the sampling speed of the
measuring devices and would also be more difficult to correlate the operations and sources of energy
consumption. The processor modeled for our simulation results is based on 0.25micron technology using
2.5V supply voltage.
Figure 6. Energy consumption trace of encryption (every 10 cycles)
First, we show the energy behavior of the original DES algorithm to demonstrate the type of
information that it leaks. Figure 6 shows the energy profile of the original encryption process revealing
clearly the 16 rounds of operation. This result reiterates that the energy profile can show what operations
are being performed. Next, we present a (differential) energy consumption trace for two different secret
keys to demonstrate that the energy consumption profiles can reveal more specific information.
Figure 7 illustrates the difference in energy consumption profiles generated for two different secret
keys using the same plaintext. This example illustrates that it is possible to identify differences in even a
single bit of the secret key. Similar observations on energy differences can also be made using differences
in one of key-related variables generated internally.
Figure 7. Difference between energy consumption profiles generated using two different secret keys
(vary in bit 10), 1st round
Ene
rgy
in p
Joul
e E
nerg
y in
pJo
ule
Time/Cycle
Time/Cycle (Original)
Figure 8. Difference between energy consumption profiles generated using two different keys before
masking process
Figures 8 and 9 show the difference between the two energy consumption traces generated using two
different secret keys and the same plaintext before and after the energy masking. These traces are shown
only for the first round of DES algorithm for clarity. The graphs clearly demonstrate that using secure
instructions can mask the energy behavior of the key related operations. While the effectiveness of the
algorithm is shown using differences between profiles generated from two different keys, the results hold
good for other key choices as well. Specifically, the mean of the energy consumption traces which generate
different internal (key related) bits will not exhibit any differences that can be exploited by DPA attacks.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 1940 3879 5818 7757 9696 11635 13574 15513 17452 19391
Time/Cycle (Masking)
Energ
y in
pJo
ule
Figure 9. Difference between energy consumption profiles generated using two different keys after
masking process
Figures 10 and 11 depict the difference between the energy consumption traces generated using two
different plain texts but the same secret keys.
Ene
rgy
in p
Joul
e
Time/Cycle (Original)
-30
-20
-10
0
10
20
30
1 2151 4301 6451 8601 10751 12901 15051 17201 19351
Time/Cycle (Original)
Ene
rgy
in p
Joul
e
Figure 10. Difference between energy consumption profiles generated using two different plaintexts
before masking process
-0.5
0
0.5
1
1.5
2
2.5
3
1 2177 4353 6529 8705 10881 13057 15233 17409 19585
Time/Cycle (Masking)
Ener
gy in
pJo
ule
Figure 11. Difference between energy consumption generated using two different plaintexts after
masking process
The first operation in the DES is plaintext permutation. Since this process is not operated in a secure
mode, the differences in the input values result in the difference in both the energy masked and original
versions. The other operations in the first round are secure; as a result, there are energy consumption power
differences.
However, the proposed solution is not without its drawbacks. The energy masking requires that the
same amount of energy be consumed independent of the data. Thus, additional energy is consumed in the
circuits added for the complementary portion of the circuit as shown in Figure 12. However, this additional
energy is 45 pJ per cycle (as compared to an average energy consumption of 165 pJ per cycle in the
original application). Note that we add excessive energy even in places where the differential profile in
Figure 8 shows no difference.
0
5
10
15
20
25
30
35
40
45
1 149 297 445 593 741 889 1037 1185 1333 1481 1629 1777
Time/Cycle (The 1st Key Permutation)
Ene
rgy
in p
Joule
Figure 12. Additional energy consumed due to the energy masking operation dur ing the 1st key
permutation
This is because the same secure instruction is used for parts of the input that are the same for both the
runs. Of course, in portions where the data was identical we have nothing to mask but we need to be
conservative to account for all possible inputs in a statistical test using large samples. It must also be
observed that our approach of using selective secure instructions helps to reduce the energy cost as
compared to a naïve implementation that balances energy consumption of all operations. For example,
looking at code segment shown earlier in Figure 4, we increase the energy cost of only one of the four load
operations executed in the segment. On the other hand, the naïve approach would convert all the four load
operations into secure loads thereby consuming significantly more energy than our strategy.
The total energy consumed without any masking operation is 46.4 uJoule. Our algorithm consumes
52.6 uJoule while the naïve approach consumes 63.6 uJoule (all loads and stores are secure instructions).
When all instructions are secure instructions, it will consume almost as twice as much as the original, 83.5
uJoule. This scheme is the one used in current dual-rail solutions [4].
5. RIJNDAEL ALGORITHM
Due to the weakness of the DES such as the short key length, a new encryption standard has been
introduced, called AES (Advanced Encryption Standard). One example of AES is the Rijndael algorithm
[18] proposed by J. Daemen and V. Rijmen. Rijndael is also a symmetric cryptography that has
independent block and key lengths. Both of the block length and key length can be multiples of 32 bits,
with the minimum of 128 bits and the maximum of 256 bits. Similar to the DES algorithm, the Rijndael
algorithm also runs in several rounds. The number of rounds (Nr) depends on the length of the key (K) and
the data block (D) as shown in Table 1.
Table 1. Number of Rounds
Nr D = 128 D = 192 D = 256
K = 128 10 12 14
K = 192 12 12 14
K = 256 14 14 14
All the operations in the Rijndael algorithm use a data block and a key block that are represented as an
array of bytes. The representation of the data block is a two-dimensional array, in which the number of row
is 4 and the number of columns is based on the length of the data block divided by 32 (called Nb). The key
block is also represented similar to the data block (the number of row is 4 and the number of column is key
length/32, called Nk).
The basic operation of the Rijndael encryption consists of three steps. The first step is an Initial Round Key
addition. This step contains two sub-steps: the key expansion and the round key selection. The main goal of
the key expansion step is to expand the key such that the number of round key bits is equal to the number
of block length multiplied by the number of rounds plus one [18]. The sub key used in each round will be
the sub-set of this expanded key, for instance the first sub key used in the first round will be the first key
block (the size of the key block is the same as the data block size) of the expanded key.
The second step of the Rijndael encryption is the Nr-1 rounds. Before these rounds, the plain text is
XORed with the extended secret key (AddRoundKey). Every round performs similar operations. The first
operation is called ByteSub operation, an operation that transforms the data block into a new data block
using S-box. The substitution table (S-Box) is invertible and composed by two transformations: the
multiplicative inverse in GF(s8) and an affine transformation over GF(2) [18]. The second operation is the
ShiftRow transformation that shifts cyclically every row of the output of ByteSub operation based on a
particular offset. The third operation is the MixColumn operation. The MixColumn operation XORs every
column of the output from ShiftRow with a fixed polynomial c(x). The final operation for each round is the
round key addition (AddRoundKey). This operation addition takes the output from MixColumn then XOR
them with the round key generated from the Initial Round Key Addition.
The last step of the Rijndael encryption is called Final Round operation. The final Round operation is
similar to the operations that happen in each round (1st round through Nr-1th round) except it does not
contain MixColumn operation. Figure 13 shows all the operations that take place in Rijndael encryption
(left side) as well as their secure versions (right side).
Figure 13. Modified Rijndael Algor ithm
From the Figure 13, we can see that all of the procedures use the secret key or data that have been modified
by the secret key. Consequently, to mask the energy consumption behavior, we need to mask all of those
operations. The operations used by Rijndael are similar to those used by DES, such as XOR, table lookup,
and shift.
5.1 Evaluation
To evaluate the Rijndael algorithm, we use the same procedures as we did for the DES algorithm. Figure
14 shows the original energy consumption behavior of the Rijndael encryption with the 256-bit key length
and 128-bit data length.
Initial Round Key Addition KeyExpansion(Key,xKey) AddRoundKey(Data,xKey) M th Rounds (M ax: Nr-1) ByteSub(Data) ShiftRow(Data) MixColumn(Data) AddRoundKey(Data,Km)
Final Rounds ByteSub(Data) ShiftRow(Data) AddRoundKey(Data,KNr)
Initial Round Key Addition sKeyExpansion(Key,xKey)
sAddRoundKey(Data,xKey) M th Rounds (M ax: Nr-1) sByteSub(Data) sShiftRow(Data) sMixColumn(Data) sAddRoundKey(Data,Km)
Final Rounds sByteSub(Data) sShiftRow(Data) sAddRoundKey(Data,KNr)
0
50
100
150
200
250
1 445 889 1333 1777 2221 2665 3109 3553 3997 4441 4885 5329
Time/Cycle (Original)
En
erg
y in
pJo
ule
Figure 14. Energy consumption trace of encryption
-300
-250
-200
-150
-100
-50
0
50
100
150
200
250
1 450 899 1348 1797 2246 2695 3144 3593 4042 4491 4940 5389
Time/Cycle
Ene
rgy
in p
Jou
le
Figure 15. Difference between energy consumption profiles generated using two different secret keys
Figure 15 illustrates the differences in energy consumption between two different keys with the same plain
text (the length of the plain text is only one data block). From this figure one can see that there is an energy
difference in the Initial Round Key Addition. This is not surprising since that operation uses a different
secret key. As for the remaining operations (Nr rounds), we also see that there is some differences in
energy consumption since the Nr rounds operate using the extended secret key generated by the Initial
Round Key Addition.
-80
-60
-40
-20
0
20
40
60
80
1 442 883 1324 1765 2206 2647 3088 3529 3970 4411 4852 5293
Time/Cycle
Ene
rgy
in p
Jou
le
Figure 16. Difference between energy consumption profiles generated using two different plain texts
Initial R. Key Nr-1 and Final Rounds
Initial R. Key Nr-1 and Final Rounds
Figure 16 shows the energy differences if we use different plain text but the same secret key. In this
scenario, the energy consumption of Initial Round Key Addition should be the same while the energy
consumed by all of the following operations (Nr-1 Round and Final Round) should give some differences;
meaning that all cycles should consume different amounts of energies.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 442 883 1324 1765 2206 2647 3088 3529 3970 4411 4852 5293
Time/Cycle
En
erg
y in
pJo
ule
Figure 17. Difference between energy consumption profiles generated using two different secret keys
after Masking
Finally, Figure 17 depicts the energy differences if we use secure operations for the two encryption
processes that use the same plain text but different secret keys. The energy consumptions of these two
processes should be the same due to the key-related energy masking operations. The total energy consumed
by the original method is 1.05 uJ while our secure operations consume 1.5uJ. On the other hand, when we
use the dual-rail logic, the energy consumption is 2uJ.
6. RANDOMIZATION
As explained earlier, the main goal of our method is to mask all the operations that use the secret key
directly or indirectly so that these operations will not reveal any information. There is another method that
can be used to protect information, which makes use of randomization. In the randomization method, the
main goal is different from ours. Specifically, it tries to introduce noise such that it is harder to observe the
information leaked by the power consumption behavior. One solution that can be adopted to provide
randomization is to accommodate another module that implements the same operation using a different
structure [20]. For example, one can introduce two types of adders, and the adder to be used at particular
cycle can be selected randomly. This method can make differential power analysis harder, but its
effectiveness depends largely on the number of additional modules we have (as well as the random
module).
Let us assume that we have two modules (type A and type B) for each ALU module (2 modules for adder,
two modules for multiplication, etc) on our five-stage pipeline. In the fetch stage, we do not need to access
ALU (assuming that the fetch stage has a dedicated adder). The write-back stage does not need the ALU
either. Consequently, the number of stages that access the ALU is three (decode, execute, and memory).
Table 2 shows several possible ALU usages for this pipeline. Let us assume, without loss of generality, that
the type A consumes more energy than the type B.
Table 2. Several possible ALU usages
Decode Stage Execute Stage M emory Stage
Single M odule (type A) Use type A Use type A Use type A
Single M odule (type B) Use type B Use type B Use type B
Two M odules (A and B) Use type A Use type A Use type B
Two M odules (A and B) Use type B Use type A Use type B
Two M odules (A and B) Use type B Use type B Use type A
Two M odules (A and B) Use type B Use type A Use type A
Two M odules (A and B) Use type A Use type B Use type A
Two M odules (A and B) Use type A Use type A Use type A
When we use two modules, there is a possibility that, at any cycle, the process uses the same type of
module as shown in Table 2. Therefore, if we execute the operation a sufficient number of times, we can
possibly have a result, which is similar to the result obtained when we only use a single module. If we take
the maximum of energy consumptions at particular cycle, the amount of energy consumed at that cycle
might be the same as the amount of energy consumption when we use module type A. The result of taking
the minimum energy is also similar to the result when we use only type B.
To improve the security using randomization, one can add more modules so that the probability of using
the same modules will be lower with the cost of extra circuit space.
We simulate randomization on the Rijndael algorithm using two modules of adder and three modules of
adder.
0
50
100
150
200
250
300
350
400
450
500
1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196Cycles
Ener
gy
in p
Joule
CLA Two Modules
Figure 18. Energy consumption traces using one module and two modules (a single run)
We use the randomization based approach to select between two adders (CLA and BrentKung), as shown
in Figure 18. The black line is the energy consumption traces when we use only CLA, while the grey line is
the energy consumption when we use the two adders. From these energy traces, one can see that the energy
consumption trace generated using two adders is not the same as the one obtained when we use only one
adder. Consequently by applying randomization, one can reduce the vulnerability to the Simple Power
Analysis attack. However, it should be noticed that this method is still vulnerable to the Differential Power
Analysis attack.
0
50
100
150
200
250
300
350
400
450
500
1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196Cycles
Ener
gy in
pJo
ule
CLA Two Modules
Figure 19. Energy consumption traces using one module and two modules (8runs)
If we execute this randomization based algorithm a sufficient number of times (we run eight times in our
experiment) and then take the maximum energy consumption for each cycle (it can also be minimum or the
median), we will have an energy consumption trace that is similar to the one generated from a single
module (CLA) as shown in Figure 19. This makes the DPA attack still possible (but it requires more time).
To improve the hardness of DPA attack, one can use three modules instead of two. The third module in our
experiments is built based on the Kogge Stone technique with the average power of 6.6095e-03.
0
50
100
150
200
250
300
350
400
450
500
1 10 19 28 37 46 55 64 73 82 91 100 109 118 127Cycles
Ener
gy in
pJo
ule
Brent-Kung Three Modules
Figure 20. Energy consumption traces using one module and three modules (8 runs)
Figure 20 shows the results with three modules for eight runs. As one can see from this figure, we
experience more energy consumption differences than we have in Figure 19. In order to obtain an energy
trace that is similar to the one generated from single module we need to have more runs as illustrated in
Figure 21. Specifically, if we execute 27 times, we can obtain energy traces which are similar to those
depicted in Figure 19. It concludes that using more modules, we can make DPA attack harder but still
possible.
0
50
100
150
200
250
300
350
400
1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121Cycles
Ener
gy
in p
Joule
Brent-Kung Three Modules
Figure 21. Energy consumption traces using one module and three modules (27 runs)
7. CONCLUSIONS
Smart cards, unlike magnetic stripe cards, can carry all necessary functions and information on the
card. Therefore, recent years have witnessed a significant increase in smart card usage throughout the
world. In fact, Data Monitor predicts that over 3 billion smart cards are in circulation worldwide. As a
result, ensuring secure use of these cards is receiving a lot of attention.
In this paper, we focus on addressing the problem of information leakage using power analysis and
propose a solution. The uniqueness of our solution comes from the fact that, unlike many previous
techniques, we approach the problem from an architectural perspective and consider adding secure
instructions to a given architecture. The purpose of these secure instructions is to hide the energy behavior
of sensitive variables in the application (e.g., key values in encryption algorithms). Our experiments with
the DES and Rijndael encryption algorithms demonstrate that the proposed solution is very effective in
preventing the information leakage due to power analysis. We also expand on the randomization based
method [20] by adding more modules in order to make Differential Power Analysis harder.
This work has certain limitations that will be addressed in the future. The use of complementary values
and dual rail logic alone will not be sufficient in the future. This is because power consumption differences
will also arise due to signal transitions on adjacent lines of on-chip buses [8]. Current dual-rail encoding
schemes do not mask the key leakage arising due to these differences. Thus, more work is required in
addressing power analysis attacks.
8. ACKNOWLEDGMENT
This material is based on work supported in part by the Office of Naval Research under Award No.
N00014-01-1-0859, MARCO 98-DF-600 GSRC Award, NSF Awards No. 0093082, 0093085, 0082064,
0103583. Any opinions, findings, and conclusions or recommendations expressed in this presentation are
those of the authors and do not necessarily reflect the views of the Office of Naval Research
9. REFERENCES
[1] Data Encryption Standard (DES). Federal Information Processing Standards Publication 46-2. 1993
December 30.
[2] E. Biham, A. Shamir. Power Analysis of The Key Scheduling of The AES Candidates. Proceedings of
the second AES Candidate Conference, March 1999, pp. 115-121.
[3] J. Coron. Resistance Against Differential Power Analysis for Elliptic Curve Cryptosystems. Ç.K Koç
and C. Paar, Eds., Cryptographic Hardware and Embedded Systems, vol. 1717 of Lecture Notes in
Computer Science, pp. 292-302, Springer-Verlag, 1999.
[4] J.-F. Dhem, N. Feyt. Hardware and Software Symbiosis Helps Smart card Evolution. IEEE Micro,
Vol. 21, Issue. 6, pp. 14-25, November – December 2001.
[5] L. Goubin, J. Patarin. DES and Differential Power Analysis The “Duplication” Method. Proceeding of
CHES’99, Springer, Lecture Notes in Computer Science, Vol. 1717, August 1999.
[6] P. Kocher. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS and other Systems,
Advances in Cryptology, Proceedings of Crypto’96, LNCS 1109, N.Koblitz, Ed., Springer-Verlag,
1996, pp.104-113.
[7] P. Kocher, J. Jaffe, and B. Jun. Introduction to Differential Power Analysis and Related Attacks.
http://www.cryptography.com/dpa/technical,1998.
[8] P. Sotiriadis, A.P. Chandrakasan. Low Power Bus Coding Techniques Considering Interwire
Capacitance. IEEE Custom Integrated Circuits Conference, April 2000.
[9] P. Wayner. Code Breaker Cracks Smart Cards’ Digital Save. New York Times, June 22 1998.
[10] Secure Hash Standard. Federal Information, Processing Standards Publication 180-1, April 17 1995.
[11] S. Horwitz, T. Reps, and D. Binkley. Interprocedural Slicing Using Dependence Graphs. ACM
Transactions on Programming Languages and Systems 12, 1 (January 1990), 26-60.
[12] S. Moore, R. Anderson, and M. Kuhn. Improving Smart card Security Using Self-Timed Circuit
Technology. The Eight IEEE International Symposium on Asynchronous Circuit And Systems,
Manchester, UK, April 8 – April 11 2002.
[13] S. Skorobogatov, R. Anderson. Optical Fault Induction Attacks. IEEE Symposium on Security and
Privacy, 2002.
[14] S.W. Smith, E.R. Palmer, S. Weingart. Using a High-Performance, Programmable Secure
Coprocessor. Proceedings of the Second International Conference on Financial Cryptography.
Springer-Verlag Lecture Notes in computer Science, 1998.
[15] O. Kommerling and M. G. Kuhn. Design Principles for Tamper-Resistant Smart card Processors.
USENIX Workshop on Smart card Technology, Chicago, IL, May 10 – May 11 1999.
[16] W. Ye, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, "The Design and Use of SimplePower: A
Cycle-Accurate Energy Estimation Tool". Design Automation Conference, June 2000.
[17] S. S. Muchnick. Advanced Compiler Design Implementation. Morgan Kaufmann Publishers, San
Francisco, CA, 1997.
[18] J. Daemen, and V. Rijmen. “Rijndael Block Cipher, AES Proposal” . Computer Security Resource
Center, National Institute of Standard Technology.
[19] K. Tiri, M. Akmal, and I. Verbauwhede, “A Dynamic and Differential CMOS Logic with Signal
Independent Power Consumption to Withstand Differential Power Analysis on Smart Cards” , 29th
European Solid-State Circuits Conference (ESSCIRC 2002), 2002.
[20] L. Benini, A. Macii, E. Macii, E. Omerbegovic, M. Pancino, and F. Pro, “Energy-Aware Design
Techniques for Differential Power Analysis Protection” , 40th Design Automation Conference,
Anaheim, CA, 2003.
[21] H. Saputra, N. Vijaykrishnan, M. Kandemir, M.J. Irwin, R. Brooks, S. Kim, and W. Zhang, “Masking
the Energy Behavior of DES Encryption” , Design Automation and Test in Europe Conference
(DATE’03), Messe – Munich, Germany, 2003.