2_Fast Montgomery Modular Multiplication and RSA
更新时间:2023-06-05 01:21:01 阅读量: 实用文档 文档下载
- 2fast4you推荐度:
- 相关推荐
RSA paper
Fast Montgomery Modular Multiplication and RSA
Cryptographic Processor Architectures
Ciaran McIvor, Máire McLoone, John V McCanny
The Institute of Electronics, Communications and Information Technology
School of Electrical and Electronic Engineering
The Queen’s University of Belfast, Stranmillis Road, Belfast, BT9 5AH
E-mail: c.mcivor@ee.qub.ac.uk, maire.mcloone@ee.qub.ac.uk, j.mccanny@ee.qub.ac.uk
Alan Daly, William Marnane
Department of Electrical and Electronic Engineering
University College Cork
Ireland
E-mail: aland@rennes.ucc.ie, l.marnane@ucc.ie
Abstract - New, generic silicon architectures for implementing Montgomery’s multiplication algorithm are presented. These use Carry Save Adders (CSAs) to perform the large word length additions required by this algorithm when used for RSA encryption and decryption. It is shown that using a four-to-two CSA with two extra registers rather than a five-to-two CSA leads to a useful reduction in the critical path of the multiplier, albeit at the expense of a small increase in circuitry. For operand lengths of 1536-bits and greater, the percentage gain in data throughput rate outweighs the percentage increase in silicon area. Moreover, for a 2048-bit operand length, typical of what is required in many future generation applications, the gain in data throughput is 27.9% compared with a 9.9% increase in area. The practical application of this approach has been demonstrated by applying this to the design of RSA processor architectures with 512-bit and 1024-bit key sizes. The resulting Montgomery multiplier and RSA processor performance results presented are the fastest reported to date in the literature.
I. INTRODUCTION
Public-key cryptosystems are important as they provide confidentiality, authentication, data integrity and non-repudiation [1,2]. RSA [3] is the most widely used public-key cryptosystem. An RSA operation is a modular exponentiation, which requires repeated modular multiplications. For security reasons RSA operand sizes need to be 1024-bits or greater in length meaning that high data throughput rates are difficult to achieve.
The Montgomery multiplication algorithm [4,5] is the most efficient modular multiplication algorithm available. It replaces trial division by the modulus with a series of additions and divisions by a power of 2. Thus, it is well suited to hardware implementations. For this reason, this algorithm has formed the basis of many of the RSA hardware architectures reported to date [6,7,8].
To date, several techniques have been proposed in order to avoid carry propagation during the addition stages of the algorithm as this is a key factor in determining performance. One approach proposed by Elbirt and Paar [7] is to break these additions into x-bit stages, where x is an optimal bit length
chosen to take advantage of the fast carry chains available on modern FPGAs. However, an inevitable drawback of this approach is that architectures developed can be very heavily technology and implementation dependent. For example, it is unlikely that a design developed for a specific FPGA family will show the same speed advantages if migrated to a modern ASIC technology or indeed an alternative type of FPGA or PLD . An alternative approach presented by Blum and Paar [8] is based on the use of FPGA systolic array multiplier architectures with varying processing element sizes (i.e. 4-bits, 8-bits and 16-bits). However, these systems have again been tailored specifically for the Xilinx XC4000 FPGA series, limiting, we believe, the broader application of the architectural ideas presented.
In this paper we introduce new, generic Montgomery multiplier and associated RSA processor architectures, which can be rapidly reconfigured to accommodate varying operand sizes and are suitable for implementation across a wide range of technology (FPGA, PLD, ASIC) options. These architectures are therefore ideally suited to a multitude of applications, including those that require a very high level of security to those, which only need relatively little protection over a shorter time period. As will be discussed, the designs created are highly competitive and we believe produce the best performance results obtained to date.
As will be discussed, long carry propagation can be avoided by using carry save adders (CSAs) to perform the addition stages of Montgomery’s algorithm. The inherently parallel structure of these adders means that a shallow combinational logic depth is guaranteed independent of the target technology onto which the CSAs are implemented. This also makes them ideal for very long integer arithmetic, a key requirement for RSA implementation. As is discussed in Section , we propose and compare two different multiplier architectures, one using a five-to-two CSA and the other a four-to-two CSA with two additional registers. Each multiplier can perform a Montgomery multiplication in only k+1 and k+2 clock cycles respectively, where k is the operand bit length. It is shown that
RSA paper
the latter architecture outperforms the former in terms of data so that they can employ different CSA variants when throughput rate, but vice versa in terms of area, for operand implemented architecturally. The motivation is to produce sizes greater than 512-bits. However, the percentage increase in silicon circuits with a small critical path delay. the throughput rate outweighs the percentage increase in area
for very large operand lengths in excess of 1536-bits. Thus, the B. Five-to-two CSA Architecture
four-to-two CSA architecture will become increasingly The first of these variants is based on a five-to-two CSA, attractive in the future when RSA key sizes of 2048-bits and which is used to calculate (4). This type of adder comprises
three levels of carry save logic, where each level of logic greater in length are needed to protect information.
As will be discussed in Section , an RSA processor using implements (5) and (6).
SUM = X1 xor X2 xor X3 (5) the five-to-two multiplier can carry out encryption in only
CARRY = (X1 and X2) or (X1 and X3) or (X2 and X3) (6) (k+2)(ek+3) clock cycles, where k is the modulus bit length and
ek is the public exponent bit length. Here the Fermat prime, The sum of the bit vectors SUM and CARRY is equal to the F4=216+1, is used as the public exponent. RSA decryption can sum of the three input bit vectors X1, X2 and X3. An outline be performed in (k/2+2)(dk/2+3) clock cycles, where dk is the diagram of the five-to-two CSA operation is shown in Fig. 1. private exponent bit length, also using the five-to-two multiplier. The Chinese Remainder Theorem (CRT) technique [9] is used in this case.
The concepts developed have been verified and demonstrated through FPGA implementations, in this case using the Xilinx Virtex2 family of chips. The performance results of these Montgomery multiplier and RSA processor implementations are, to the authors’ knowledge, the best reported in the literature to date.
II. MONTGOMERY MULTIPLIER ARCHITECTURES
A. Montgomery Multiplication Background
Given an integer a < n, where n is the k-bit modulus, A is said to be its n-residue with respect to r if,
A = a*r (mod n) where r=2k(1)
.
Likewise, given an integer b < n, B is said to be its n-residue with respect to r if,
B = b*r (mod n) (2)
The Montgomery product of A and B can then be defined as,
Z = A*B*r-1 (mod n) (3)
where r-1
is the inverse of r, modulo n. The radix-2 version of Montgomery’s multiplication algorithm [10], which calculates the Montgomery product of Aand B, is summarised in the pseudo code below.
Algorithm : Montgomery Multiplication (A, B, n) S[0] = 0; for i in 0 to k-1 loop qi = (S[i]0 + Ai * B0) mod 2; S[i+1] = (S[i] + Ai * B + qi * n) div 2; end loop; return S[k]; The critical delay of Algorithm occurs during the calculation of the S values given by the three input addition, S[i+1] = (S[i] + AThe main contributing factor to this delay is the i * B + qi * n) (4) carry propagation resulting from the very large operand additions. However, this can be avoided by using CSAs to calculate (4). This observation led us to re-write Algorithm in alternative formats, which we have defined as Algorithm and Algorithm in Sections .B and .C below. They are explicitly derived
The carry save representation of the five input operands is output from the register after only one clock cycle using this
approach. Algorithm can be re-written as Algorithm , given below. Note that the input operands A and B and the output product S are now represented in carry save format as A1 and A2, B1 and B2, and S1 and S2 respectively. CSR stands for carry save representation.
SUM CARRY Fig. 1. Block Diagram of Five-to-two CSA
Algorithm : Five-to-two CSA Montgomery Multiplication (A1, A2, B1, B2, n)
S1[0] = 0; S2[0] = 0; for i in 0 to k-1 loop qi = (S1[i]0 + S2[i]0) + (Ai * (B10 + B20)) mod 2; S1[i+1], S2[i+1] = CSR (S1[i] + S2[i] + Ai * (B1 + B2) + qi * n) div 2; end loop;
return S1[k], S2[k];
The critical delay of Algorithm occurs during the calculation of the five-to-two carry save addition, S1[i+1], S2[i+1] = CSR (S1[i] + S2[i] + Ai * (B1 + B2) + qi * n) (7)
RSA paper
The determination of the correct Ai values means that a full addition of the input operands A1 and A2 needs to be carried out. This is calculated “on-the-fly” using a barrel register full
A1 A2 B1 B2 n CLKRST
adder component (BRFA) as shown in Fig. 2.
Fig. 2. Barrel Register Full Adder Component
The correct Asignificant bit of the i value is ready when needed by adding the least
A1 and A2 input operands using a single
full adder. Once the current Ai value has been used, the A1 and
A2 operands are barrel shifted one place to the right so that the next Acycles are required to complete a full addition of the i value can be determined. In this way no extra clock A1 and A2input operands. As a result, (7) can be computed in a single clock cycle. Also, an extra one clock cycle is required to reset the signals S1[0] and S2[0] to zero at the beginning of each multiplication. Thus, Algorithm can be executed in only k+1 clock cycles, where k is the bit length of the operands. Also, due to the parallel nature of the carry save adders, a high Montgomery multiplier data throughput rate can be achieved here.
In order to verify this approach, a silicon design based on this was captured generically in VHDL and implemented into the Xilinx Virtex2 series of FPGAs using varying bit lengths. Table provides performance results for these
implementations.
TABLE
PERFORMANCE RESULTS FOR FIVE-TO-TWO CSA MULTIPLIER Xilinx Area Device Length Speed Rate
(k) (MHz) (Slices) (Mb/s)
XC2V1500 512 105.57 4,962 105.36
XC2V6000 2048 55.19 20,984
55.16
It should be noted that an increase in the operand bit length by a factor of 4 from 512-bits to 2048-bits results in an increase of only 1.9 in the critical path delay through the multiplier. On the other hand, the area of the multiplier approximately doubles as the bit length doubles. Thus, the architectures are highly scalable and regular, which is a useful characteristic as it ensures that the multiplier design will be of use in the future when operand sizes will need to be greater than 2048-bits in length. It should also be noted that the area figures presented include all the I/O registers required to store the large bit length operands, as shown in Fig. 3.
S1(k) S2(k) Fig. 3. I/O Interface of Five-to-two CSA Montgomery Multiplier For a 1024-bit operand length, the inputs are registered into the microchip at a rate of 32-bits per clock cycle. Thus, this process is completed after thirty-two clock cycles. Likewise, the outputs are clocked out of the chip 32-bits per clock cycle over thirty-two cycles. This is necessary due to the limited number of I/O pins available on currently available silicon chips. The same I/O interface technique, as shown in Fig. 3, is used for the four-to-two CSA multiplier architectures and the RSA processor architectures, which are detailed in Sections .C and .B respectively.
The approach described can be further extended to improve data throughput rate at the expense of silicon area. This is discussed in the next section.
C. Four-to-two CSA Architecture This architectural approach is based on the use of a four-to-two rather than a five-to-two CSA resulting in a saving of a full level of carry save logic, as shown in Fig. 4. Again, the carry save representation of the four input
operands is output from the register after only one clock cycle.
Therefore, by using the four-to-two CSA to calculate (4), Algorithm can be reformulated as Algorithm . SUM CARRY
Fig. 4. Block Diagram of Four-to-two CSA
RSA paper
Algorithm : Four-to-two CSA Montgomery Multiplication (A1, A2, B1, B2, n) D1, D2 = CSR (B1 + B2 + n + 0) S1[0] = 0; S2[0] = 0;
for i in 0 to k-1 loop
qi = (S1[i]0 + S2[i]0) + (Ai * (B10 + B20)) mod 2; if Ai = 0 and qi = 0 then
S1[i+1], S2[i+1] = CSR (S1[i] + S2[i] + 0 + 0) div 2; elsif Ai = 1 and qi = 0 then
S1[i+1], S2[i+1] =
CSR (S1[i] + S2[i] + B1 + B2) div 2; elsif Ai = 0 and qi = 1 then S1[i+1], S2[i+1] = CSR (S1[i] + S2[i] + n + 0) div 2;
else
S1[i+1], S2[i+1] =
CSR (S1[i] + S2[i] + D1 + D2) div 2; end if; end loop;
return S1[k], S2[k];
The main computation to be performed here is the four-to-two carry save addition,
S1[i+1], S2[i+1] = CSR (S1[i] + S2[i] + y + z) (8) The value of y and z in (8) depends on the value of both the Aand q. Indeed, the determination of i the state of both these signals simultaneously, represents extra i signals from Algorithm control logic in the form of a more complex multiplexer than is required for the five-to-two CSA case. This needs to be taken into account in determining the critical delay. However, as shown below, this delay is still less than that implied by Algorithm . This decrease in the critical path delay comes at the expense of two extra registers, D1 and D2. These are used to store the carry save representation of the input operands B1, B2 and n, which are added at the beginning of each new multiplication. D1 and D2 are possible values of y and z from (8). The calculation of D1 and D2 at the beginning of each multiplication means that an extra clock cycle is required to complete a full Montgomery multiplication. Also, the Arequired in Algorithm must again be computed “on-the-fly” i values using the BRFA component shown in Fig. 2. Thus, Algorithm can be executed in only k+2 clock cycles.
The four-to-two CSA design was also captured generically in VHDL and implemented into the Xilinx Virtex2 series of FPGAs for varying bit lengths. Table gives performance results for these implementations. These results can be directly compared with those given in Table for the five-to-two CSA multiplier. Fig. 5 provides a graphical representation of the comparisons in terms of percentage increase in data throughput rate and area when using the four-to-two CSA architecture rather than the five-to-two design.
For operand lengths greater than 512-bits, the four-to-two architecture has a consistently higher throughput rate than the five-to-two architecture. However, the percentage increase in area is higher than the percentage increase in throughput for operand lengths up to approximately 1536-bits.
TABLE
PERFORMANCE RESULTS FOR FOUR-TO-TWO CSA MULTIPLIER
Xilinx Area Device Length Speed (k) (MHz) (Slices) Rate
(Mb/s) XC2V1500 512 105.56 5,689
105.15 XC2V6000 1536 70.97 17,300 70.88
For operand lengths greater than this, however, the throughput benefits begin to outweigh the negative effects of increased area. In fact, for an operand length of 2048-bits, the percentage increase in throughput is 27.9% in comparison to a relatively small increase in area of 9.9%. The overall trend points towards a constant rise in the percentage throughput rate coupled with a decline in the percentage area increase as operand lengths increase. Therefore, it seems that the four-to-two CSA multiplier is of more use for RSA systems with key lengths in excess of 1536-bits and therefore becomes an increasingly attractive option in future RSA systems. For bit lengths less than this then the five-to-two CSA multiplier is the preferable option.
3025)
20%( tn15ecr10eP50-5
Operand Bit Length (k)
Fig. 5. % Throughput/Area Increase using 4-to-2 CSA instead of 5-to-2 CSA
The Montgomery multiplier performance results reported are, to the authors’ knowledge, the fastest reported in the literature to date. Also, the concepts outlined in this section can easily be applied to other Montgomery multiplier designs, which employ different addition techniques to calculate (4). The next section will demonstrate the practicality of the multiplier architectures through an implementation of the RSA cryptosystem for 512-bit and 1024-bit key sizes. For these key sizes, the five-to-two CSA multiplier is used to calculate the successive modular multiplications.
III. RSA PROCESSOR ARCHITECTURES
A. RSA and Modular Exponentiation
The RSA encryption and decryption functions are given by C=Me(mod n) and M=Cd(mod n) respectively, where M is a plaintext message block, C is a ciphertext block, n is the k-bit modulus, and e and d are the public and private exponents respectively. The equation ed=1(mod(p-1)(q-1)) must also hold, where p and q are two large prime numbers (~ k/2-bits in length) and n=pq.
RSA paper
Thus, an RSA operation is a modular exponentiation with operands satisfying the conditions stated above. A well known and widely used modular exponentiation algorithm is the square and multiply algorithm given as Algorithm V below, which computes M=Cd(mod n). dk is the bit length of the TABLE
CLOCK CYCLES TO COMPUTE ONE RSA EXPONENTIATION
Type of Operation Corresponding Algorithm 5 Number of Clock Cycles Number of Clock Cycles for private exponent d.
Algorithm 2k V: Modular Exponentiation (C, d, n) K = 2 (mod n); (computed externally) P[0] = MontMult (K, C, n); R[0] = MontMult (K, 1, n); for i in 0 to dk loop
P[i+1] = MontMult (P[i], P[i], n); if d[i] = 1 then
R[i+1] = MontMult (R[i], P[i], n) end if; end loop;
M = MontMult (1, R[k], n); return M;
B. The RSA Cryptoprocessor Architectures
By using the five-to-two CSA multiplier architecture to calculate the MontMult (Montgomery Multiplication) stages of Algorithm V, we can re-write this algorithm as Algorithm V, given below. As a result, we have been able to develop RSA processor architectures with very impressive data throughput rates.
Algorithm V: Five-to-two Multiplier Modular Exponentiation
(C, d, n) K = 22k
(mod n); (computed externally)
P1[0], P2[0] = 5to2_MontMult (K, 0, C, 0, n); R1[0], R2[0] = 5to2_MontMult (K, 0, 1, 0, n); for i in 0 to dk loop
P1[i+1], P2[i+1] =
5to2_MontMult (P1[i], P2[i], P1[i], P2[i], n); if d[i] = 1 then
R1[i+1], R2[i+1] =
5to2_MontMult (R1[i], R2[i], P1[i], P2[i], n) end if; end loop;
M1, M2 = 5to2_MontMult (1, 0, R1[k], R2[k], n); M = M1 + M2; return M;
An analysis of the total number of clock cycles required to compute a full RSA encryption and decryption operation is given in Table . Again, k is the bit length of the modulus and ek and dk are the bit lengths of the public and private exponents respectively. An extra clock cycle is now required to complete a full modular multiplication, as shown in Table . This is to allow the results of the intermediate multiplications and squarings in the for loop in Algorithm V to be registered and re-used as inputs to the next iteration of that loop. The pre- and post-processing operations are required to convert the operands
Calculations for Encryption
Decryption
Pre-P1[0],P2[0] processing R1[0],R2[0] k+2 k/2+2 For Loop
R1[i+1],R2[i+1]
ek(k+2) dk/2(k/2+2) Post-processing M1,M2
k+2 k/2+2 Addition
M
k+2 k/2+2 Total Number Clock Cycles:
(k+2)(ek+3)
(k/2+2)(dk/2+3)
in Algorithm V to and from n-residue format respectively, as required by the Montgomery multiplication algorithm described in Section . The final addition operation makes use of the BRFA component, as shown in Fig. 2, to perform a full addition of the carry save result, M1 and M2, of Algorithm V. By using the CRT technique [9], two half-size decryption operations can be computed in parallel. Hence, the amount of clock cycles required to compute this operation is decreased by approximately a factor of four, as shown in Table , thus increasing the potential data throughput rate of the RSA decryption function significantly.
RSA processor architectures based on the five-to-two CSA multiplier described were captured generically in VHDL and implemented using the Xilinx Virtex2 series of FPGAs (using the highest place and route effort level) for 512-bit and 1024-bit key sizes. Tables V and V provide performance results for these implementations. A public exponent of 216+1 and a k-bit private exponent were used during testing.
TABLE V
RSA ENCRYPTION RESULTS
Xilinx Bit Clock No. Throughput One RSA Device Length Speed CLB (k) (MHz) Slices Rate Encryption (Mb/s) (ms) XC2V3000 512 96.27 11,196 4.79 0.11
The RSA processor performance results reported here are, to the authors’ knowledge, the fastest reported in the literature to date. For instance, our 1024-bit encryption and decryption times are 3.4 and 3.9 times faster respectively than 1024-bit RSA encryption and decryption times recently reported by Blum and Paar [11]. They also reported a 1024-bit radix-16 decryption time of 3.10 ms, which is 1.2 times slower than our best result. Wu, Hong and Wu [12] also recently reported a VLSI RSA implementation with a 512-bit decryption throughput rate of between 328 kb/s and 578 kb/s. This is based on a 0.6 µm CMOS technology. This result is 2.5 and 1.4 times slower than our 512-bit decryption throughput rate, in the worst and best cases respectively.
RSA paper
TABLE V
RSA DECRYPTION RESULTS
Xilinx Device XC2V3000 Length (k) 512 Speed (MHz) 105.98 CLB Slices 13,513 Throughput Rate (kb/s) 812.03 Decryption (ms) 0.63
It should also be noted that both of the aforementioned implementations also applied the CRT technique [9] to RSA decryption and that Blum and Paar [11] also used a public Montgomery’s multiplication algorithm. CSAs were used to perform the large operand additions required when used in an RSA implementation. It has been shown that by using a four-to-two CSA with two extra registers rather than a five-to-two CSA, then a useful reduction in the critical path delay of the multiplier can be achieved albeit at the expense of some additional silicon. However, for very large operand lengths of 1536-bits and greater, the percentage gain in data throughput rate outweighs the percentage increase in area. For a 2048-bit operand length, the gain in data throughput is 27.9% compared with a 9.9% increase in area. Thus, the four-to-two CSA exponent of 216+1.
C. Further Discussions
As discussed in Section , our designs are technology independent and thus can be ported to other technologies without difficulty. Therefore, even higher data rates should be achievable when the designs are migrated to modern ASIC technology. Previous experience would indicate [13] that a further doubling in performance is possible if the architectures are implemented in a 0.18µm CMOS. Also, with some further modifications, our Montgomery multiplier designs can be tailored so that they can be implemented as arithmetic co-processors on modern microprocessor based smart cards. Typical RISC processors for such applications require around 80,000 gates with associated co-processors requiring between 80,000 and 190,000 gates depending on the bit length k, as shown in Table V .
TABLE V
MONTGOMERY MULTIPLIER AREA RESULTS (NUMBER OF GATES)
Five-to-two CSA 512 82,929
The figures in Table V are based on the use of a single Montgomery multiplier. As shown, the gate counts required are easily within the realms of what can be integrated on a single chip. Also, by exploiting certain time-area trade-offs such as reducing the amount of carry save logic used to evaluate (7) and (8), then the silicon area of our architectures can be reduced significantly at the expense of some extra latency. For example, the amount of carry save logic used in the five-to-two CSA multiplier architecture can be reduced by two thirds at the expense of two extra clock cycles required to compute (7). Thus, as discussed in Section , the architectures presented are highly flexible and can be used in a multitude of different industrial applications, including as high-speed network security processors or as low silicon area arithmetic co-processors on embedded smart cards.
IV. CONCLUSION
In this paper we have presented new, generic modular multiplication architectures suitable for implementing
multiplier is of more use for RSA systems with key lengths in excess of 1536-bits, whereas the five-to-two CSA multiplier is preferable otherwise. In order to demonstrate the practicality of our multipliers, RSA processor architectures were developed for 512-bit and 1024-bit key sizes using the five-to-two CSA multiplier. The resulting FPGA-based Montgomery multiplier and RSA processor performance results presented are, we believe, the fastest reported in the literature to date with future significant performance gains achievable through ASIC implementation.
ACKNOWLEDGMENT
Amphion Semiconductor Ltd. and a Northern Ireland Department of Learning postgraduate studentship in the form of a CAST award have funded this research.
REFERENCES
[1] Stallings, W.: “Network and Internetwork Security Principles and Practice”. Prentice Hall International, 1995.
[2] Menezes, A., Oorschot, P., Vanstone, S.: “Handbook of Applied
Cryptography”. CRC Press, 1997.
[3] Rivest, R.L., Shamir, A., Adleman, L.: “A Method for Obtaining Digital
Signatures and Public-Key Cryptosystems”. Communications of the ACM, 21(2): 120-126, February 1978.
[4] Montgomery, P.L.: “Modular Multiplication without Trial Division”.
Math. Computation, Vol. 44, pp. 519-521, 1985.
[5] Koc, C.K., Acar, T., Burton S. Kaliski Jr.: “Analyzing and Comparing
Montgomery Multiplication Algorithms”. IEEE Micro, 16(3): 26–33, June 1996.
[6] Eldridge, S.E., Walter, C.D.: “Hardware Implementation of
Montgomery’s Modular Multiplication Algorithm”. IEEE Transactions on Computers, Vol. 42, pp. 693–699, July 1993.
[7] Elbirt, A.J., Paar, C.: “Towards an FPGA Architecture Optimized for
Public-Key Algorithms”. SPIE Symposium on Voice, Video and Communications, Sept 1999.
[8] Blum, T., Paar, C.: “Montgomery Modular Exponentiation on
Reconfigurable Hardware”. Proceedings 14th Symposium on Computer Arithmetic, pp. 70-77, 1999.
[9] Quisquater, J-J., Couvreur, C.: “Fast Decipherment Algorithm for RSA
Public-Key Cryptosystem”. Electronics Letters, Vol. 18, pp. 905–907, October 1982. [10] Walter, C.D.: “Montgomery Exponentiation Needs No Final
Subtractions”. Electronics Letters, 35(21): 1831-1832, October 1999.
[11] Blum, T., Paar, C.: “High-Radix Montgomery Modular Exponentiation
on Reconfigurable Hardware”. IEEE Transactions on Computers, Vol. 50, pp. 759-764, July 2001.
[12] Wu, C-H., Hong, J-H., Wu, C-W.: “VLSI Design of RSA Cryptosystem
Based on the Chinese Remainder Theorem”. Journal of Information Science and Engineering 17, 967-980, 2001. [13] Amphion Semiconductor Ltd.: “Data Security Data Sheets”,
正在阅读:
2_Fast Montgomery Modular Multiplication and RSA06-05
河南省公务员四级联考历年真题 - 图文12-01
初三各科复习方法计划05-31
工厂公司食堂承包合同203-18
浅谈《大希庇阿斯篇》中的论美12-19
山洪灾害防御知识宣传手册10-02
七年级上信息技术测试题03-23
社区工作者考试试题库题库(300道)03-27
2021年度幼儿园教师师德心得体会08-04
- 1Fast Deadlock Detection in CCS Systems Using Petri Nets
- 2Automatic Tuning Matrix Multiplication Performance on Graphics hardware
- 3Signature schemes based on the strong RSA assumption
- 4Signature schemes based on the strong RSA assumption
- 5Instruction set compiled simulation A technique for fast and
- 6远程管理卡手册(RSA II)
- 7RSA加密算法java编程实现
- 8RSA加密解密算法c语言程序
- 9RSA加密算法java编程实现
- 10Scaling Up Fast Evolutionary Programming with Cooperative Coevolution
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- Multiplication
- Montgomery
- Modular
- Fast
- RSA
- 过程能力指数CPK计算
- 第九课第一框《海纳百川 有容乃大》导学案
- 书面表达范文Microsoft Word 文档
- 2015-2020年中国染发膏市场发展现状及市场评估报告
- 2019年河北省选调生考试《申论》试卷答案及解析
- 建筑工程质量员工作总结
- 柳子厚墓志铭 教案 定稿
- 室性早搏的鉴别诊断
- 美国留学 DIY 会计essay德州农工大学
- 将Win7和XP安装到同一个硬盘
- 51单片机C语言应用开发实例精讲目录
- 新概念第一册11-12教案
- 六轴机器人在机床上下料中的应用
- 责任胜于能力演讲稿
- 服装效果图教学大纲
- 外研版高中英语必修1单词表_带音标
- 天然气计量基础讲义
- 第六章 证券投资的行业分析
- 城市管理学任务03-0006
- 职场口语:超级实用的办公室英语