A Fast Hardware Architecture for Integer to Naf Conversion for Koblitz Curves

Ingeniería e Investigación

Print version ISSN 0120-5609

Ing. Investig. vol.34 no.2 Bogotá May/Aug. 2014

https://doi.org/10.15446/ing.investig.v34n2.40542

http://dx.doi.org/10.15446/ing.investig.v34n2.40542

Design of elliptic bend cryptoprocessors over GF(two¹⁶³) using the Gaussian normal basis Diseño de criptoprocesadores de curva elíptica sobre GF(2¹⁶³) usando bases normales Gaussianas

P.C. Realpe-Muñoz^ane, 5. Trujillo-Olaya² and J. Velasco-Medina³

¹ Paulo Cesar Realpe Muñoz. Bs in Physic Engineering, Universidad del Cauca, Republic of colombia. M.Sc. in Electronics Engineering, Universidad del Valle, Colombia. Affiliation: Universidad del Valle, Republic of colombia. E-postal service: paulo.realpe@correounivalle.edu.co

² Vladimir Trujillo Olaya. Bs in Electronic Applied science, Universidad del Valle, Republic of colombia. Chiliad. Sc. in Electronics Engineering, Universidad del Valle, Republic of colombia. Affiliation: Universidad del Valle, Republic of colombia. Electronic mail: vladimir.trujillo@correounivalle.edu.co

^three Jaime Velasco Medina. B.South in Electric Applied science, Academy del Valle, Colombia. Ph.D in Microelectronics, TIMA-INPG, France. Universidad del Valle, Colombia. Email: jaime.velasco@correounivalle.edu.co

How to cite: Realpe-Muñoz, P. C., Trujillo-Olaya, V., & Velasco-Medina, J. (2014). Blueprint of elliptic curve cryptoprocessors over GF(ii¹⁶³) using the Gaussian normal basis. Ingeniería eastward Investigación, 34(ii), 55-65.

Abstruse

This paper presents an efficient hardware implementation of cryptoprocessors that perform the scalar multiplication kP over a finite field GF(ii¹⁶³) using two digit-level multipliers. The finite field arithmetic operations were implemented using the Gaussian normal basis (GNB) representation, and the scalar multiplication kP was implemented using the Lopez-Dahab algorithm, the 2-not-next course (2-NAF) halve-and-add algorithm and the w-τNAF method for Koblitz curves. The processors were designed using a VHDL clarification, synthesized on the Stratix-4 FPGA using Quartus II 12.0 and verified using SignalTAP II and Matlab. The simulation results show that the cryptoprocessors provide a very proficient performance when performing the scalar multiplication kP. In this example, the computation times of the multiplication kP using the Lopez-Dahab algorithm, ii-NAF halve-and-add algorithm and 16-τNAF method for Koblitz curves were 13.37 µs, 16.90 µs and 5.05 µs, respectively.

Keywords: elliptic curve cryptography, Gaussian normal basis, digit-level multiplier, scalar multiplication.

RESUMEN

En este trabajo se presenta la implementación eficiente en hardware de criptoprocesadores que permiten llevar a cabo la multiplicación escalar kP sobre el campo finito GF(2¹⁶³) usando dos multiplicadores a nivel de digito. Las operaciones aritméticas de campo finito fueron implementadas usando la representación de bases normales Gaussianas (GNB), y la multiplicación escalar kP fue implementada usando el algoritmo de López-Dahab, el algoritmo de bisección de punto 2-NAF y el método due west-τNAF para curvas de Koblitz. Los criptoprocesadores fueron diseñados usando descripción VHDL, sintetizados en el FPGA Stratix-Four usando Quartus II 12.0 y verificados usando SignalTAP 2 y Matlab. Los resultados de simulación muestran que los criptoprocesadores presentan un muy buen desempeño para llevar a cabo la multiplicación escalar kP. En este caso, los tiempos de computo de la multiplicación kP usando Lopez-Dahab, bisección de punto ii-NAF y xvi-τNAF para curvas de Koblitz fueron xiii.37 µs, 16.90 µs and v.05 µs, respectivamente.

Palabras clave: criptografía de curva elíptica, bases normales Gaussianas, multiplicador a nivel de digito, multiplicación escalar.

Received: October 29th 2013 Accepted: February 25th 2014

Introduction

The use of computer networks and the steady increase in the number of users of these systems have driven the demand to meliorate security for the storage and transmission of data. There are many applications that must ensure the privacy, integrity or authentication of the data stored or transmitted. The security of the applications has been resolved by using different cryptographic algorithms, which are used in private- or public-key cryptosystems.

The security of public-central cryptosystems is based on mathematical issues that are computationally difficult to resolve, i.e., problems for which there are no known algorithms to resolve them in a practical fourth dimension. Because of the loftier volume of information processed, electronic systems are required to perform the encryption and decryption processes in the shortest time possible without compromising the security. In this regard, hardware implementations of cryptographic algorithms accept advantages, such as high speed, high security levels and low cost.

1 of the about important cryptosystems is the elliptic curve cryptosystem (ECC), proposed independently by Koblitz (Kobliz, 1987) and Miller (Miller, 1986). There accept been several investigations of the theory and practice of this cryptosystem. The results of the investigations demonstrated the ability of these systems to encrypt information and concluded that this cryptosystem offers better security, efficiency and memory usage. The hardware implementations of ECCs have many advantages and are used in equipment such as ATMs, smart cards, telephones, and cell phones.

In elliptic curve cryptography, it is known that finding the discrete logarithm of a random elliptic curve element with respect to a publicly known base point, that is, the elliptic curve discrete logarithm problem or ECDLP, has loftier hardness. The entire security of the ECC depends on the ability to compute the scalar multiplication and the inability to compute the multiplicand given the original and production points. Furthermore, the finite-field size of the elliptic curve determines the computational complexity of the higher up problem.

Several works regarding scalar multiplication over a finite field GF(2 ^m ) have been proposed and implemented efficiently in hardware.

C. Rebeiro and D. Mukhopadhyay (Rebeiro and Mukhopadhyay, 2008) presented a cryptoprocessor with novel multiplication and inversion algorithms. J.Y. Lai, T.Y. Hung, K.H. Yang and C.T. Huang (Lai et al., 2010) proposed an architecture for elliptic curves along with the operation scheduling for the Montgomery scalar multiplication algorithm. B. Muthukumar and South. Jeevananthan (Muthukumar and Jeevanahthan, 2010) implemented an elliptic bend co-processor, which is a dual-field processor with a projective coordinate. A.One thousand. Rahuman and G. Athisha (Rahuman and Athisha, 2010) presented an architecture using the Lopez-Dahab algorithm for the elliptic bend signal multiplication and Gaussian normal footing (GNB) for field arithmetic over GF(2¹⁶³). M. Amara and A. Siad (Amara and Siad, 2011) proposed an EC betoken multiplication processor intended for cryptographic applications such every bit digital signatures and cardinal agreement protocols. X. Cui and J. Yang (Cui and Yang, 2012) implemented a processor that parallelizes the computations of the ECC at the bit-level and gains a considerable speed-up. The processor is fully implemented in hardware and supports cardinal lengths of 113 bits, 163 $.25 and 193 bits.

In this context, we present in this piece of work efficient hardware implementations of cryptoprocessors over GF(2¹⁶³) using a GNB representation and the Lopez-Dahab algorithm, 2-NAF halve-and-add algorithm and due west-τNAF method for Koblitz curves (Anomalous Binary Curves or ABC) with window sizes of 2, iv, viii and 16 to perform the scalar multiplication kP.

The principal contributions of this work are: (i) the hardware design of cryptoprocessors using the GNB over GF(2¹⁶³) and three scalar multiplication algorithms (Lopez-Dahab, halve-and-add and w-τNAF method for Koblitz curves) to decide the best cryptoprocessor for embedded cryptographic applications. (ii) an efficient hardware implementation of cryptoprocessors based on the due west-τNAF method with different window sizes for the Koblitz curves. They present the all-time trade-off between the ciphering time and area, obtaining a higher performance than the other cryptoprocessors reported in the literature. Additionally, they are very suitable for hardware cryptosystems.

Mathematical groundwork

GNB representation

ANSI X9.62 (ANSI, 1999) describes the detailed specifications of the ECC protocols and uses the GNB to represent the finite field elements (NIST, 2000). An element over GF(two^yard) has the computational advantage of performing squaring very efficiently. However, multiplying distinct elements tin can be cumbersome. In this example, there are multiplication algorithms that brand this operation both simpler and more efficient.

A normal basis over GF(2^{one thousand}) is equally follows:

where β ∈ GF(2¹⁰⁰⁰) and whatsoever element A ∈ GF(ii¹⁰⁰⁰) tin can be written as follows:

The blazon T of a GNB is a positive integer and measures the complexity of the multiplication operation with respect to that basis. More often than not, the type T of a smaller value provides a more efficient multiplication. For a given m and T, the field GF(two^chiliad) can have at nigh 1 GNB of type T. A GNB exists whenever 1000 is not divisible by eight. Permit m and T be two positive integers. Then, the blazon T of a GNB over GF(2^m) exists if and only if p =Tm+one is prime number.

If is a GNB over GF(2^m), so the element is represented by the binary cord (a₀a_onea₂ ... a_m-1), where a_i ∈ {0,1}. In this instance, the multiplicative identity element is represented by the bit string of all ones.

The additive identity element is represented by the bit string of all zeros. An important result for the GNB arithmetic is Fermat's Theorem. For all β ∈ GF(2^m), and then

This theorem is important for performing the squaring of an element over GF(two^m).

Finite field arithmetic operations

The following arithmetics operations can be performed over GF(2^m) when using a normal footing of blazon T.

Addition: If A = (a ₀ a _ane a ₂...a _m-ane) and B = (b ₀ b ₁b₂...b_m-1) are elements over GF(two^m), then A + B = C = (c ₀ c ₁ c _ii...c _m-i), where c_i = (a_i + b_i ) modernistic 2.

Squaring: Let A = (a0a1a2...am-ane) ∈ GF(2^m), then

Based on Fermat's Theorem, , then

In this case, squaring is a elementary rotation of the vector representation.

Multiplication: The multiplication C = A·B is based on the multiplication matrix R_(m-1)XT (Masoleh, 2006). If A = (a ₀ a _ane a _ii...a _chiliad-i) and B = (b ₀ b ₁ b ₂...b _chiliad-1) are elements over GF(2^chiliad) and are represented using a GNB, then A· B = C = (c ₀ c ₁ c _ii...c _{one thousand-1}), where the coefficient c ₀ is given past equation (half dozen)

and R(i,j), 0 ≤ R(i,j) ≤ m-1, 1 ≤ i ≤ m-1, 1 ≤ j ≤ T denotes the (i, j)th element of the matrix. To obtain the ith coefficient of C, i.e., c_i , add "i mod chiliad" to all indices in (6).

Inversion: If A ≠ 0 and A ∈ GF(ii^g), the inverse of A is C ∈ GF(ii^m), and C is the only element of GF(ii^m) such that A·C = i, i.e., C = A ^-1. The algorithm used to calculate the inversion is based on equation (vii):

Itoh and Tsujii (Itoh and Tsujii, 1998) proposed a method that reduces the number of multiplications to calculate the inversion, and it is based on the following:

Trace: If A is an element over GF(2^thousand), the trace of A is:

If A = (a₀a₁a_ii...a_g-1 )and information technology is represented in a normal basis, then the trace tin exist computed efficiently as follows:

The trace of the element A has two possible values (0 or 1). Quadratic equation solving over GF(2^m): If A is an element of GF(2^yard) represented in a normal basis, then the quadratic equation:

has two - 2T solutions over GF(2^thousand), where T = Tr(A). Therefore, if T = 1, there is no solution, and if T = 0, at that place are two solutions. If z is ane solution, so the other solution is z + 1. For case, if A = 0, the solutions are z = 0 and z = 1 (IEEE std 1363, 2000). The algorithm 1 calculates the quadratic equation over GF(2^m) for a normal basis representation.

Square root: Let A = (a ₀ a _one a ₂...a _1000-1) ∈ GF(2^m), then

In this case, the foursquare root in a normal ground is a simple rotation of the vector representation (IEEE std 1363, 2000).

Elliptic curve arithmetic

A not-supersingular elliptic curve E(F_q ) is divers every bit a gear up of points (x, y) ∈ GF(2^k)×GF(2^{one thousand}) that satisfies the affine coordinates equation,

where a and b ∈ F_q and are constants with b ≠ 0 together with the betoken at infinity denoted past O. The group operations for the elliptic curve arithmetic in affine coordinates are defined as follows. Permit P = (x ₁, y _ane) and Q = (x _ii, y _ii) be two points that vest to the curve, and permit the add-on changed of P be defined equally -P = (x ₁, 10 _ane + y _ane). And so, if Q ≠ -P, the point P + Q = (x _three, y _three) tin can be computed as:

Using the group operations to a higher place, the elliptic bend scalar multiplication tin can exist defined as follows. Let E be an elliptic curve over GF(2^thou), permit Q and P ∈ E be two arbitrary elliptic points satisfying equation (xiii), and let k be an capricious positive integer. Then, the elliptic bend scalar multiplication Q = kP is defined every bit:

Because the grouping operations described in equations (14) and (fifteen) using the finite field arithmetics in affine coordinates, three main elliptic curve operations tin be divers: point addition, indicate doubling and bespeak halving. In the grouping operations, the inversion is the arithmetics operation that is almost expensive over GF(two^m), and this functioning can be avoided with a projective coordinate representation. In this case, the inversion is avoided by using the finite field multiplication.

A point P in the projective coordinates is represented using three coordinates (10, Y and Z). For the Lopez-Dahab (LD) projective coordinates (Lopez and Dahab, 1999), the projective signal (Ten : Y : Z) with Z ≠ 0 corresponds to the affine coordinates x = X/Z and y = Y/Z ². Then, equation (13) tin can be mapped from the affine coordinates to the LD projective coordinates as:

The three group operations for the elliptic curve arithmetics in the projective and affine coordinates can be computed as (Menezes et al., 2003):

1. Point doubling Q = iiP, where Q = (X ₃ : Y _iii : Z ₃) and P = (X ₁ : Y ₁ : Z _one) in the projective coordinates, can exist performed using 4 finite field multiplications, such every bit

ii. Indicate improver Q + P, where Q = (Ten ₁ : Y _i : Z ₁) in the projective coordinates and P = (x ₂, y ₂) in the affine coordinates, can be performed using 8 finite field multiplications, such as

three. Indicate halving Q/ii is the inverse operation of point doubling. Allow P = (x ₁, y₁) and Q = (x ₂, y ₂) be the points over the curve (13) in the affine coordinates. The betoken halving functioning is performed past computing P such that Q = iiP by solving the following equations:

Permit the λ-representation of a betoken Q = (x _two , ytwo) exist Q = (x, lQ), where

If Q in the λ-representation is the input of the betoken halving algorithm, then it is possible to compute indicate halving without using the affine coordinates. In scalar multiplication, repeated indicate halving operations can exist performed directly on the λ-representation. Notwithstanding, when a point addition is required, a conversion to the affine coordinates must be performed. Algorithm 2 computes the point halving operation.

Koblitz Curves

Koblitz curves, or anomalous binary curves, are elliptic curves defined over GF(2^m). The main advantage of these curves is that the scalar multiplication operation can exist performed without the use of signal doubling operations.

An algorithm for scalar multiplication on Koblitz curves is presented by Solinas (Solinas, 2000). The Solinas algorithm or the τ-adic window method computes a special τ-adic expansion of an integer number in . For example, a special τ-adic expansion is the window τ-adic non-adjacent grade (τNAF).

The Koblitz curves are curves defined over GF(two^m) by:

where a ∈ {0,1}, that is, curves East ₀ and E ₁.

These curves nowadays the post-obit property: If P(x, y) is a point on the curve East_a , then the point (x ^two, y ²) is also a point on E_a . In addition, they satisfy (x ⁴, y ⁴) + 2(x, y) = µ(10 ², y ⁱⁱ) for each bespeak (x, y) on E_a , where µ = (-i)^one-a. In GF(ii^m), the Frobenius map τ is an endomorphism that raises every element to its power of two, i.due east., τ : x → x ⁱⁱ. Then, the Frobenius endomorphism is performed efficiently (cost-free) when the elements of the finite field are represented in a normal basis (Cui and Yang, 2012). Koblitz shows that the bespeak doubling operation can be performed efficiently past using the Frobenius endomorphism, if the binary curve is defined over GF(2^m) and a ∈ {0, 1}. Then, the Frobenius map can be defined as τ: (ten, y) → (x ² , y ²). In this case, if the scalar g is represented in τNAF, and then

The τ-adic representation tin can be obtained by repeatedly dividing k by τ, where the remainders of each partition step are named digits u_i . This procedure is also used to obtain the representation'south NAF of the scalar yard, namely, k is repeatedly divided past 2. To decrease the number of bespeak additions for the scalar multiplication, it is necessary to obtain a τNAF representation of 1000 that achieves a smaller number of nonzero digits. The scalar multiplication tin be computed as:

The result corresponds to the Hamming weight of the τNAF, and it is equal to the binary NAF representation, i.e., the Hamming weight ≈ (log_two thou)/three, and the length of the τ-adic representation of k is approximately 2 m , which is twice the length of the binary NAF representation. Nevertheless, Solinas presents a method that reduces the length of the τ-adic representation to approximately thou. Thus, the Koblitz curves' arithmetic is based on the point improver and Frobenius map τ.

Hardware architectures for elliptic curve cryptoprocessors

In this section, nosotros present the hardware architectures for elliptic curve cryptoprocessors over GF(2¹⁶³) using a Gaussian normal basis. Each cryptoprocessor is designed using one algorithm for the scalar multiplication, namely, the Lopez-Dahab algorithm (Lopez and Dahab, 1999), the halve-and-add 2-NAF algorithm (Menezes et al., 2000) and the w-τNAF method for Koblitz curves with w = 2, 4, 8 and 16 (Solinas, 2000).

Digit-level multiplier

The finite field multiplication over GF(ii^thousand) is an operation that is more important for performing the scalar multiplication. Thus, this functioning must be implemented efficiently in hardware. There are several algorithms for performing the finite field multiplication that are presented in Azarderakhsh and Masoleh (2010), Huang et al. (2011,), Wang and Fan (2012) Lee and Chiou (2012).

Azarderakhsh and Masoleh (Azarderakhsh and Masoleh, 2010) proposed a serial or parallel digit-level multiplier with a digit-size d, where ane ≤ d ≤ 1000. In this case, if d = m, the multiplier is parallel and if d < m, it is serial and requires , clock cycles to generate all the m coefficients of C = A ·B = (c₀c_anec_two...c _k-1), where A = (a₀a_ia₂...a _m-1)and B = (b₀b₁b₂...b _thousand-1)are elements represented in a GNB over GF(2^chiliad). Figure 1 shows the digit-level GF(2^m) multiplier for T = 4, where A, B and C are registers for storing the input and output elements.

The block ρ is formed past the blocks ρ₁ and ρ₂, and its structure depends on type T of the GBN with T ≥ 2 and the multiplication matrix R. The block J is a set of m, ii-input AND gates. The block CS is a d-fold cyclic shift and an adder GF (two¹⁶³), which is a set of ii-input XOR gates.

The cake ρ_i is an optimal set of XOR gates that are obtained using (27), and ρ_two is a fix of XOR gates that are obtained from the main matrix ρ:

The time complexity of the digit-level multiplier is TA + (ii + )TX, where TX and TA are the delay time of a two-input XOR gate and a two-input AND gate, respectively. The surface area complexity of this multiplier is thousand ² ANDs and ≤ ii m ² - 2 m XORs (Azarderakhsh and Masoleh, 2010).

To implement the digit-level multiplier with a digit-size d = 55 in hardware, that is 1000 = iii clock cycles, a Matlab lawmaking is written to generate the equations of the blocks ρ_one and ρ_ii, which are synthesized using VHDL.

Hardware compages using the Lopez-Dahab algorithm

The scalar multiplication kP for not-supersingular elliptic curves over binary fields using the Lopez-Dahab algorithm is shown in Algorithm 3, which is a modified version of the Montgomery algorithm, where the same operations are performed during each iteration of the main loop (D. Hankerson et al., 2003).

In this case, the scalar multiplication is performed in three steps: 1) conversion of P from affine to projective coordinates; 2) compute Q = kP by add-on and doubling; and iii) conversion of Q from projective to affine coordinates.

To implement the above algorithm in hardware, we initially ascertain three functions: M_add () performs the point addition, Yard_double () performs the betoken doubling and Thou_xy () performs the conversion from projective to affine coordinates. These functions are defined equally follows:

where, (ten, y) and (x ₃, y _three) are the coordinates of points P and Q = kP, respectively.

Bespeak addition and point doubling are implemented in hardware using the information dependence graph shown in Figure 2, and the conversion from the projective to affine coordinates is implemented using two digit-level multipliers for the data dependence graph shown in Figure 3. The inversion functioning is implemented using the Itoh-Tsujii algorithm (Itoh and Tsujii, 1998).

Co-ordinate to Figures 2 and 3, the latencies for Thou_add and M_double and the projective to affine conversion are iii M and 15 One thousand +one, respectively, where M is the latency for a finite field multiplication.

In pace 4 of Figure iii, two multipliers are used, and one of them with the block of rotation performs the inversion of an element A ∈ GF(two¹⁶³). In this case, the latency of the inversion is ten M because information technology needs 10 finite field multiplications for m = 163. In step half-dozen, a multiplier is but used because the last operation of the coordinate conversion requires a multiplication.

The architecture of the cryptoprocessor over GF(ii¹⁶³) using the Lopez-Dahab algorithm is shown in Figure 4. It uses two register files, two parallel digit-level multipliers, one inversion block, several squaring and adder blocks, a main control and an FSM to perform the point addition, point double and conversion from the projective to affine coordinates.

The functional blocks that perform the finite field arithmetic operations over GF(2¹⁶³) for the Lopez-Dahab cryptoprocessor are shown in Figure 5. It is important to mention that the functioning of any cryptoprocessor depends on the efficient implementation of the hardware for the finite field arithmetic.

The chief command is an FSM that generates the command signals to perform the scalar multiplication, process the key, initialize the cryptoprocessor and control the I/O registers. The 2nd FSM performs the indicate addition, betoken doubling and conversion from the projective to the affine coordinates.

In Effigy 6, the ASM chart of the main command is shown, where the variables 10 ₁, Z ₁, Ten ₂ and Z _ii are initialized and stored in the annals files. Each fleck of the scalar grand is evaluated from left to right to perform the operations M_{add together} and M_double using the data dependence graph shown in Figure 2. If the flake k_i is '1', then M_add (Ten _i,Z _one,Ten _ii,Z _ii), 1000_double (X ₂,Z ₂) are computed. Else, G_add (10 _ii,Z _ii,X _i,Z _ane), M_double (Ten ₁,Z ₁). When all bits of the scalar thou are evaluated, the conversion from the projective to affine coordinates is executed using the data dependence graph shown in Effigy three, and kP in the affine coordinates is stored in the output annals.

Algorithm 3 is more than resistant against simple ability analysis and timing attacks. This is because the computation cost does not depend on the specific bit of the scalar k. For each bit of the scalar k, one bespeak improver and i betoken doubling are performed. The proposed scheme has two different execution paths depending on the current flake of the scalar chiliad. Both execution paths have the aforementioned complexity and crave the same number of clock cycles.

Hardware architecture using the halve-and-add together algorithm

Schroeppel (Schroeppel, 2000) and Knudsen (Knudsen, 1999) independently proposed the halve-and-add algorithm to accelerate the scalar multiplication on the elliptic curves divers over the binary extension fields. This algorithm uses an elliptic curve primitive called bespeak halving every bit shown in algorithm 2.

Considering, theoretically, the point halving operation is three times faster than the point doubling operation, it is possible to accelerate the scalar multiplication Q = kP by replacing the double-and-add algorithm with the halve-and-add algorithm, which uses an expansion of the scalar k in terms of negative powers of 2 (Mercurio et al., 2006).

In the halve-and-add algorithm, it is necessary to transform the integer thousand = (km-1,...,chiliad0)2. If k´ is defined by

where n represents the club of the base of operations betoken P, and then

Equation (29) can exist generalized to a window-NAF. The NAF _w of a positive integer k and w ≥ two is represented by the expression , where each nonzero coefficient one thousand_i is odd and at virtually, i of any w consecutive digits is nonzero. In this case, the NAF _{due west} of yard can be computed using algorithm four.

In this work, a Maple code is written to obtain the expansion coefficients NAF _w with west = 2, namely, the coefficients NAF _{due west} (2^τ-1 thou mod n), which are represented by 2-bits.

The halve-and-add algorithm is shown in algorithm 5. Step 3 of the algorithm performs the point addition Q_i + P in the Lopez-Dahab mixed coordinates(Q_i and P are represented in LD projective and affine coordinates, respectively) using equation (14) and the halving point P/2 in the affine coordinates or λ-representation, if bit k_i ´ ≠ 0; else, compute indicate halving. In this case, it is of import to mention that if the results of the first two operations A and B of equation (nineteen) are equal to zero, the bespeak doubling 2P is performed in the LD projective coordinates using equation (eighteen) with Ten ₁ = ten ₂, Y ₁ = y ₂ and Z ₁ = one.

The point addition in the LD mixed coordinates and the bespeak doubling in the LD projective coordinates are implemented in hardware using the data dependence graphs shown in Effigy vii and Figure 8, respectively. According to Figures 7 and 8, the latencies for the point addition and indicate doubling are 5M and Grand + three, respectively.

The architecture of the cryptoprocessor over GF(ii¹⁶³) using the halve-and-add algorithm is shown in Figure nine, and it uses 2 register files, ii digit-level finite multipliers, one solving quadratic equation block, 1 point halving block, several squaring and adder blocks, a main control and an FSM to perform the indicate add-on, point doubling and point halving.

The functional blocks that perform the finite field arithmetic operations over GF(2¹⁶³) for the halve-and-add together cryptoprocessor are shown in Figure x. In this case, finite field arithmetic operations are the improver, squarer, square root, trace, one-half trace (quadratic equation solving in a normal basis) and multiplication.

The main control is an FSM that generates the command signals to perform the scalar multiplication, process the central, initialize the cryptoprocessor and control the I/O registers. The 2d FSM performs the bespeak addition, point doubling and point halving.

In Effigy 11, the ASM chart of the main control is shown, where the sequence processing is as follows: initialize coordinate Q according to the sign of the bit k'_t-one ; perform the indicate halving operation on P; evaluate the bit k'_i for i > t-i; compute the indicate addition in the LD mixed coordinates and point halving on P if thousand'_i ≠ 0, else compute betoken halving; and perform the conversion of the indicate P in the λ-representation to the affine coordinates only when a point addition is required. Finally, Q = kP is obtained in the LD projective coordinates.

Algorithm seven performs the rounding of a circuitous number λ₀ + λ₁τ with λ₀ and λ₁ ∈ to obtain an element .

Hardware compages using the westward-τNAF algorithm

The length of the τ-adic representation for is roughly twice log _two(max(d ₀, d _i)). Solinas (Solinas, 2000) presents a method that reduces the length of the τ-adic representation. The objective is to find of small norm with ρ ≡ thousand (mod δ), where δ = (τ ^m - 1)/(τ - 1), and apply τNAF(ρ) to calculate ρP.

Algorithm vi calculates an chemical element ρ' ≡ k (modern δ), which is besides written as ρ' ≡ one thousand partmod δ. Solinas proved that 50(ρ) ≤ m + a and if C ≥ 2, so l(ρ´) ≤ 1000 + a + 3.

Permit w ≥ 2 exist a positive integer, and α_i = i modern τ ^west for i ∈ {1, 3, 5,..., 2^w-1-i}. A westward-τNAF expansion of an nonzero element κ ∈ Z[t] is an expression:

where u _i ∈ {0, ±α_ane, ±α_three,..., ±α2^west-one-i}, u ^l-one ≠ 0 and at most, 1 of any westward consecutive digits is nonzero. Then, kP = α_u0 P + τα_u1 P + ... + τ^l-oneα_50-1 P, when the scalar g is represented in west-τNAF.

The w-τNAF expansion tin be efficiently computed using algorithm viii, which tin exist viewed as an approach similar to the general NAF algorithm. In this piece of work, a Maple code is written to obtain the expansion due west-τNAF of the scalar k with due west = ii, 4 and eight, generating 8-bit expansion coefficients and w = 16, generating xvi-fleck expansion coefficients.

Solinas proposed algorithms to compute kP using the window τNAF method for the scalar one thousand, namely, kP is calculated using the w-τNAF method and Horner'due south rule (Solinas, 2000). An efficient scalar multiplication algorithm that uses the w-τNAF method is presented in algorithm 9, where pace 1 calculates the westward-τNAF of the scalar grand with the fractional reduction modulo δ = (τ ^m - 1)/(τ - ane), namely, w-τNAF(ρ ≡ thou mod (δ)), where ρ ≡ k mod (δ) is obtained from algorithms 6 and 7; stride ii generates the multiples of the betoken P and step 4.2 performs the point addition Q + P_u , when the bit u_i ≠ 0, and betoken doubling twoQ, when the results of the ii showtime operations A and B of equation (19) are equal to aught.

The point addition in the LD mixed coordinates and the point doubling in the LD projective coordinates with b = 1 are implemented in hardware using the data dependence graphs shown in Figure seven and Figure 8, respectively.

The compages of the cryptoprocessor over GF(2¹⁶³) using the w-τNAF algorithm for Koblitz curves is shown in Effigy 12, and information technology uses two register files, two digit-level finite multipliers, ane Frobenius map block, i RAM that stores the expansion coefficients due west-τNAF of the scalar g, two ROMs that store the pre-computed points Pu in the affine coordinates, which were obtained from Matlab for due west = 2, iv, 8 and sixteen, several squaring and adder blocks, a main control and an FSM to perform the point addition, point doubling and tQ.

The functional blocks that perform the finite field arithmetic operations over GF(2¹⁶³) for the w-τNAF cryptoprocessor for Koblitz curves are shown in Figure 13.

The main command is an FSM that generates the control signals to perform the scalar multiplication, process the key, initialize the cryptoprocessor and control the I/O registers. The second FSM performs the point addition, betoken doubling and tQ.

In Effigy xiv, the ASM chart of the main control is shown, where the sequence processing is as follows: initialize the Q coordinate co-ordinate to the sign of the bit u_i of the w-τNAF expansion; evaluate the bits u_i for i > τ-1; and compute the point addition in the LD mixed coordinates and the Frobenius map τ on Q, if u_i ≠ 0. Else, compute tQ. Finally, Q = kP is obtained in the LD projective coordinates. In Figure xiv, the ASM of the FSM is shown. One important remark is that the Koblitz curves are resistant to unproblematic power analysis and to all the known special attacks (T. Juhas, 2007).

Hardware verification and synthesis results

The López-Dahab, halve-and-add together and due west-τNAF cryptoprocessors are described using generic structural VHDL, are synthesized for a digit-size of d = 55 on the Stratix-IV FPGA (EP4SGX180HF35C2) using the Altera Quartus II version 12 design software for the implementation and are verified using SignalTap Two and Matlab.

Hardware verification of the cryptoprocessors

To verify the synthesis and simulation results of the cryptoprocessors, the following parameters for a pseudo-random elliptic curve are used according to the National Plant of Standards and Engineering (NIST, 2000):

i. Random elliptic curves B-163:
The class of the bend is: ytwo + xy = tenthree + x + b
Gx = 3F0EBA16286A2D57EA0991168D4994637E8343E36
Gy = 0D51FBC6C71A0094FA2CDD545B11C5C0C7973244F1
b = 20A601907B8C953CA1481EB10512F78744A3205FD

2. Koblitz elliptic curves G-163
The form of the bend is: y2 + xy = ten3 + x + 1
Gx = 2FE13C0537BBC11ACAA07D793DE4E6D5E5C94EEE8
Gy = 289070FB05D38FF58321F2E800536D538CCDAA3D9
n = 4000000000000000000020108A2E0CC0D99F8A5EF

In Figures xv through 17, the simulation results for the cryptoprocessors over GF(ii¹⁶³) in a GNB using SignalTAP Ii and Matlab are shown.

From Figures fifteen through 17, we can see that the results obtained from Matlab are the same as the results from SignalTAP Ii. So, the hardware results verify the right functionality of the designed cryptoprocessors.

Synthesis results for the cryptoprocessors

The synthesis results of the cryptoprocessors over GF(2¹⁶³) are shown in Tabular array 1. Additionally, some of the data presented in Table I are plotted in Figure xviii.

From Effigy xviii, we can see that the w-τNAF cryptoprocessor with w = 16 performs the scalar multiplication at a faster time (5.05 ms), and the halve-and-add processor with due west = two uses fewer area resource than the other processors.

Comparison of the results with other works

To compare the performance of the designed cryptoprocessors with respect to the cryptoprocessors presented in the literature, Table 2 shows several design parameters and processing times, such as area resources, frequency, kP time and time-area product. Nonetheless, it is important to mention that performing a off-white comparison in hardware design is very difficult considering there are other technical considerations, including the technologies, hardware platforms, software tools, scalar multiplication algorithms, finite field representations, and size of the fields.

From Table two, it is possible to observe that the GF(ii¹⁶³) cryptoprocessor presented in Mahadizadeh et al (2013) requires less time to perform the scalar multiplication than our processor based on the Lopez-Dahab algorithm considering the first processor uses iii digit-level multipliers, and our blueprint uses two digit-level multipliers, and the latency to compute Madd and Mdouble is 3M. However, the offset processor requires more area than our processor. Mercurio et al (2006) computes kP by using the one-half-and-add algorithm, grand=163, polynomial bases representation and one parallel multiplier. Our processor requires more area than the mentioned processor because information technology uses two digit-level multipliers, simply our blueprint requires less fourth dimension to perform the scalar multiplication, and the latency to compute the bespeak improver is 5M. Finally, our processor is based on the Koblitz curves and has a higher performance (expanse and time) than the processor presented in Azarderakhsh (2013) because our design has a latency of fiveYard to compute the indicate addition, and it uses ii digit-level multipliers and a window method that allows us to reduce the amount of indicate add-on operations.

From Table ii, it is possible to find that the GF(two¹⁶³) cryptoprocessor presented in Mahadizadeh et al (2013) requires less time to perform the scalar multiplication than our processor based on the Lopez-Dahab algorithm because the first processor uses three digit-level multipliers, and our design uses two digit-level multipliers, and the latency to compute Madd and Mdouble is 3M. However, the offset processor requires more area than our processor. Mercurio et al (2006) computes kP past using the half-and-add algorithm, grand=163, polynomial bases representation and one parallel multiplier. Our processor requires more area than the mentioned processor considering it uses two digit-level multipliers, but our design requires less time to perform the scalar multiplication, and the latency to compute the point addition is fiveG. Finally, our processor is based on the Koblitz curves and has a higher operation (area and fourth dimension) than the processor presented in Azarderakhsh (2013) considering our pattern has a latency of vM to compute the point addition, and it uses two digit-level multipliers and a window method that allows us to reduce the amount of betoken addition operations.

Conclusions

This work presents the design of elliptic curve cryptoprocessors to compute the scalar multiplication over GF(2¹⁶³) using the GNB. The Lopez-Dahab, halve-and-add and due west?τNAF algorithms are used to pattern the cryptoprocessors, which are described using generic structural VHDL, synthesized on the Stratix IV FPGA (EP4SGX180HF35C2).

Considering the hardware verification results, the 16-τNAF cryptoprocessor performs the scalar multiplication in less fourth dimension (v.05 ms), and the 2-NAF halve-and-add cryptoprocessor uses fewer area resource than the other processors, in this case, 22670 ALUTs. All the cryptoprocessors use roughly 17% of the ALUTs of the FPGA.

Additionally, it is important to mention that the algorithms are synthetized on the aforementioned hardware platform using Quartus II, are simulated in Modelsim, and are verified using SignalTAP and Matlab; the cryptoprocessors utilise 2 digit-level finite field multipliers over GF(two¹⁶³) in the GNB; the expansion coefficients for the private key k are obtained using the software Maple; and the FSMs apply a data dependence graph to perform yardP to achieve the minimal states.

Future work volition be oriented to increase the performance of the designed cryptoprocessors and the hardware implementation of the GF(2233) processors. Additionally, new cryptoprocessors will exist designed based on elliptic curves that are not included in the National Institute of Standards and Technology (NIST), such equally the Hessian and Edwards curves that perform the scalar multiplication kP.

References

Amara, M., & Siad, A. (2011). Hardware implementation of arithmetics for elliptic curve cryptosystems over GF(ii^m). In World Congress on Internet Security (WorldCIS) (pp. 73-78). London: IEEE. [ Links ]

Azarderakhsh, R., & Masoleh, R. (2010). A Modified Low Complexity Digit-Level Gaussian Normal Basis Multiplier. Arithmetic of Finite Fields (pp. 25-40). Turkey: Springer. [ Links ]

Azarderakhsh, R., & Masoleh, R. (2013). High-functioning implementation of point multiplication on Koblitz Curves. IEEE Transactions on Circuits and Systems, 60(ane), 41 - 45. [ Links ]

Chester, R., & Mukhopadhyay, D. (2008). Progress in Cryptology - INDOCRYPT 2008. Kharagpur: Springer. [ Links ]

Cui, Ten.-North., & Yang, J. (2012). An FPGA based processor for Elliptic Curve Cryptography. In International Conference on Computer Science and Information Processing (CSIP) (pp. 343-349). Shaanxi: IEEE. [ Links ]

Ghanmy, N., Khlif, N., Fourati, 50., & Kamoun, L. (2012). Hardware implementation of elliptic curve digital signature algorithm ECDSA on Koblitz curves. In International Symposium on Communication Systems, Networks & Digital Signal Processing (CSNDSP) (pp. ane - 6). Poznan: IEEE. [ Links ]

Hankerson, D., Menezes, A., & Vanstone, S. (2004). Guide to Elliptic Curve Cryptography. Springer. [ Links ]

Huang, T., Chang, C., Chiou, C., & Tan, S. (2011). Non-XOR approach for low-cost flake-parallel polynomial basis multiplier over GF(2^m). Information Security, IET, 5(3) 152-162. [ Links ]

IEEE std 1363. (2000). 1363-2000 IEEE Standard Specifications for Public-Primal Cryptography. IEEE Computer Society. [ Links ]

Itoh, T., & Tsujii, Due south. (1988). A fast algorithm for computing multiplicative inverses in GF(2^k) using normal bases. Information and Computation, 78(3), 171-177. [ Links ]

Jeevananthan, S., & Muthukumar, B. (2010). High speed hardware implementation of an elliptic curve cryptography (ECC) co-processor. In Trendz in Information Sciences & Computing (TISC) (pp. 176-180). Chennai: IEEE. [ Links ]

Johnson, D., Menezes, A., & Vastone, S. (2001). The Elliptic Bend Digital Signature Algorithm (ECDSA). International Journal of Information Security, one(one), 36-63. [ Links ]

Juhas, T. (2007). The Use of Elliptic Curves in Cryptography. Retrieved from: http://munin.uit.no/bitstream/handle/10037/1091/thesis.pdf?sequence=5 [ Links ]

Knudsen, W. (1999). Elliptic Scalar Multiplication Using Point Halving. Advances in Cryptology - ASIACRYPT (pp. 135-149). Berlin: Springer. [ Links ]

Koblitz, N. (1987). Elliptic curve cryptosystems. Mathematics of computation, 48(1987), 203-209. [ Links ]

Lai, J.-Y., Hung, T.-Y., Yang, K.-H., & Huang, C.-T. (2010). Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 3933 - 3936). Paris: IEEE. [ Links ]

Lee, C., & Chiou, C. (2012). Scalable Gaussian Normal Footing Multipliers over GF(2^m) Using Hankel Matrix-Vector Representation. Periodical of Betoken Processing Systems, 69(2), 197-211. [ Links ]

Lopez, J., & Dahab, R. (1999). Fast Multiplication on Elliptic Curves Over GF(2^m) without precomputation. Cryptographic Hardware and Embedded Systems, 1717, 316-327. [ Links ]

Mahdizadeh, H., & Masoumi, K. (2013). Novel Architecture for efficient FPGA implementation of elliptic curve cryptographic processor over GF(two^163). IEEE transactions on very large scale integration (VLSI) systems, 21(12), i-iv. [ Links ]

Malik, Thousand. (2010). Efficient implementation of Elliptic Bend Cryptography using depression-power Digital Signal Processor. In International Briefing on Avant-garde Advice Technology (ICACT) (pp. 1464-1468). Phoenix Park: IEEE. [ Links ]

Masoleh, R. (2006). Efficient algorithms and architectures for field multiplication using Gaussian normal ground. IEEE Transactions on Computers, 55(1), 34-47. [ Links ]

Mercurio, S., & Rodriguez, F. (2006). Elliptic Bend Scalar Multiplication using Bespeak Halving on Reconfigurable Hardware Platforms (pp. 1-5). Mexico: CiteSeerX. [ Links ]

Miller, V. (1986). Advances in Cryptology. In CRYPTO '85 Proceedings. Santa Barbara: Springer. [ Links ]

Morales, Due south., Uribe, F., & Badillo, A. (2011). A reconfigurable GF(2^m) elliptic curve cryptographic coprocessor. In Southern Conference on Programmable Logic (SPL) (pp. 209 - 214). Cordoba: IEEE. [ Links ]

NIST. (2013). Digital Signature Standard. Gaithersburg: Federal Information Processing Standards. [ Links ]

Rahuman, A., & Athisha, Grand. (2010). Reconfigurable compages for elliptic curve cryptography. In International Conference on Communication and Computational Intelligence (INCOCCI) (pp. 461-466). Erode: IEEE. [ Links ]

Schroeppel, R. (2000). Us Patent No. EP1232602. [ Links ]

Solinas, J. (2000). Efficient Arithmetics on Koblitz Curves. Designs, Codes and Cryptography, 19(2-3), 195-249. [ Links ]

Trujillo, V., & Velasco, J. (2010). Hardware Architectures for Elliptic Curve Cryptoprocessors Using Polynomial and Gaussian Normal Ground over GF(2^233). Lecture Notes in Figurer Scientific discipline, 6480, 79-103. [ Links ]

Wang, Z., & Fan, Southward. (2012). Efficient Montgomery-Based Semi-Systolic Multiplier for Fifty-fifty-Type GNB of GF(2^1000). IEEE Transactions on Computers, 61(3), 415-419. [ Links ]

A Fast Hardware Architecture for Integer to Naf Conversion for Koblitz Curves

Ingeniería e Investigación

Print version ISSN 0120-5609

Ing. Investig. vol.34 no.2 Bogotá May/Aug. 2014

https://doi.org/10.15446/ing.investig.v34n2.40542

0 Response to "A Fast Hardware Architecture for Integer to Naf Conversion for Koblitz Curves"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel