• Nu S-Au Găsit Rezultate

A Complete Generalization of Atkin’s Square Root Algorithm

N/A
N/A
Protected

Academic year: 2022

Share "A Complete Generalization of Atkin’s Square Root Algorithm"

Copied!
24
0
0

Text complet

(1)

IOS Press

A Complete Generalization of Atkin’s Square Root Algorithm

Armand Stefan Rotaru

Institute of Computer Science, Romanian Academy Carol I no. 8, 700505 Iasi, Romania

[email protected]

Sorin Iftene

Department of Computer Science, Alexandru Ioan Cuza University General Berthelot no. 16, 700483 Iasi, Romania

[email protected]

Abstract. Atkin’s algorithm [2] for computing square roots inZ

p, wherepis a prime such that p5mod8,has been extended by M¨uller [15] for the casep9mod16. In this paper we extend Atkin’s algorithm to the general casep2s+1mod2s+1,for anys2,thus providing a complete solution for the casep1mod4. Complexity analysis and comparisons with other methods are also provided.

Keywords: Square Roots, Efficient Computation, Complexity

1. Introduction

Computing square roots in finite fields is a fundamental problem in number theory, with major applica- tions related to primality testing [3], factorization [17] or elliptic point compression [10]. In this paper we consider the problem of finding square roots inZp, wherepis an odd prime. We have to remark that, using Hensel’s lemma and Chinese remainder theorem, the problem of finding square roots modulo any composite number can be reduced to the case of prime modulus, by considering its prime factorization (for more details, see [4]).

Address for correspondence: Institute of Computer Science, Romanian Academy, Carol I no. 8, 700505 Iasi, Romania

(2)

According to Bach and Shallit [4, Notes on Chapter 7, page 194] and Lemmermeyer [13, Exercise 1.16, Page 29], Lagrange was the first to derive an explicit formula for the casep ≡3mod4in 1769.

According to the same sources ([4, Exercise 1, page 188] and [13, Exercise 1.17, Page 29]), the case p ≡ 5 mod8 was solved by Legendre in 1785. Atkin [2] also found a simple solution for the case p ≡ 5 mod 8 in 1992. In 2004, M¨uller [15] extended Atkin’s algorithm to the case p ≡ 9 mod 16 and left further developing Atkin’s algorithm as an open problem. In this paper we extend Atkin’s algorithm to the case p ≡ 2s + 1mod 2s+1,for any s ≥ 2, thus providing a complete solution for the casep ≡1mod4. M¨uller’s algorithm and our generalization use quadratic non-residues, and thus, they are probabilistic algorithms. We remark that several deterministic approaches for computing square roots modulo a primephave also been presented in the literature. Schoof [19] proposed an impractical deterministic algorithm of complexity O((log2p)9). Sze [21] has recently developed a deterministic algorithm for computing square roots which is efficient (its complexity isO˜((log2p)2))) only for certain primesp.

The paper is structured as follows. Section 2 is dedicated to some mathematical preliminaries on quadratic residues and square roots. Section 3 presents Atkin’s algorithm and its extension (M¨uller’s algorithm), both based on computing square roots of−1 modulop. We present our generalization in Section 4. Its performance, efficient implementation and comparisons with other methods are presented in Section 5. In the last section we briefly discuss the conclusions of our paper and the possibility of adapting our algorithm for other finite fields.

2. Mathematical Background

In this section we will present some basic facts on quadratic residues and square roots. For simplicity of notation, from this point forward we will omit the modular reduction, but the reader must be aware that all computations are performed modulopif not explicitly stated otherwise.

Letpbe a prime anda ∈ Zp.We say thatais a quadratic residue modulopif there existsb ∈ Zp with the property a= b2. Otherwise, ais a quadratic non-residue modulop. It is easy to see that the product of two residues is a residue and that the product of a residue with a non-residue is a non-residue.

Ifb2 =athenbwill be referred to as a square root ofa(modulop) and we will simply denote this fact byb=√

a.We have to remark that ifais a quadratic residue modulop, pprime, thenahas exactly two square roots - ifbis a square root ofa,thenp−bis the other one. In particular,1has the square roots 1and−1(in this case,−1will be regarded as beingp−1) or, equivalently,a2 = 1⇔(a= 1∨a=−1).

The Legendre symbol of amodulop, denoted as a

p

,is defined to be equal to±1depending on whetherais a quadratic residue modulop.More exactly,

a p

=

( 1, ifais a quadratic residue modulop;

−1, otherwise.

Euler’s criterion states that, for any primepanda∈Zp, the following relation holds:

ap−12 = a

p

.

(3)

Euler’s criterion provides a method of computing the Legendre symbol ofamodulopusing an exponenti- ation modulop, whose complexity isO((log2p)3).There are faster methods for evaluating the Legendre symbol - see, for example [8], in which are presented algorithms of complexityO(log(log2p)2

2log2p)for com- puting the Jacobi symbol (the Jacobi symbol is a generalization of the Legendre symbol to arbitrary moduli).

Another useful property is that 2

p

= (−1)p

2−1

8 ,that implies that2is a quadratic residue modulo pif and only ifp≡ ±1mod8.

Ifpis prime,p≡ 3mod4,anda∈ Zp is a quadratic residue modulopthenb =ap+14 is a square root ofamodulop. Indeed, in this case,b2 = ap+12 =a·ap−12 =a·

a p

= a·1 = a.Thus, in this case, finding square roots moduloprequires only a single exponentiation modulop.In the next sections we will focus on the casepprime,p≡1mod4.

3. Square Root Algorithms based on Computing

−1

In this section we present two methods for computing square roots for the cases p ≡ 5 mod 8 and p≡9mod16, both based on computing square roots of−1modulop.

3.1. Atkin’s Algorithm

Let p be a prime such that p ≡ 5mod 8 and aa quadratic residue modulo p. Atkin’s idea [2] is to express √aas√a=αa(β−1) whereβ2 =−1and2aα2 =β. Indeed, in this case,(αa(β −1))2 = a(−2aα2β) = a(−β2) =a.Moreover, in order to easily determine α, it will be convenient thatβ has the formβ = (2a)k, withkodd. Thus, the major challenge is to find√

−1of the mentioned form.

By Euler’s criterion, the relation(2a)p−12 =−1holds (ais a quadratic residue, but 2is a quadratic non-residue, therefore 2ais a quadratic non-residue), so we can choose β asβ = (2a)p−14 and α as α= (2a)

p−1 4 −1

2 = (2a)p−58 .

The resulted algorithm is presented in Figure 1.

Atkin’s Algorithm(p,a)

input: pprime such thatp≡5mod8, a∈Zpa quadratic residue;

output: b, a square root ofamodulop;

begin

1. α:= (2a)p−58 ; 2. β:= 2aα2; 3. b:=αa(β −1);

4. returnb end.

Figure 1: Atkin’s algorithm

(4)

Atkin’s algorithm requires one exponentiation (in Step 1) and four multiplications (two multiplica- tions in Step 2 and two multiplications in Step 3).

3.2. M ¨uller’s Algorithm

Letpbe a prime such thatp≡9mod16andaa quadratic residue modulop. M¨uller [15] has extended Atkin’s algorithm by expressing√

aas√

a=αad(β−1)whereβ2 =−1and2ad2α2 =β. Indeed, in this case,(αad(β−1))2 =a(−2ad2α2β) =a(−β2) =a.Moreover, in order to easily determineα, it will be convenient thatβhas the formβ= (2ad2)k, withkodd.

By Euler’s criterion, the relation(2a)p−12 = 1holds (aand 2are quadratic residues, therefore2ais a quadratic residue). We have two cases:

(I) (2a)p−14 =−1- in this case we can chooseβasβ = (2a)p−18 andαasα= (2a)

p−18 −1

2 = (2a)p−916 (d= 1);

(II) (2a)p−14 = 1- in this case we need a quadratic non-residued- by Euler’s criterion,dp−12 =−1 and, thus,(2ad2)p−14 =−1, so we can chooseβ asβ = (2ad2)p−18 andαasα= (2ad2)

p−1 8 −1

2 =

(2ad2)p16−9.

The above presentation is in fact a slightly modified variant of the original one - for Case (I), M¨uller used an arbitrary residue d. Kong et al. [11] have remarked that using d = 1 in this case leads to an important improvement of the performance of original M¨uller’s algorithm, by requiring only one exponentiation for half of the squares inZp(Case (I)) and two for the rest (Case (II)).

The resulted algorithm is presented in Figure 2.

In case(2a)p−14 =−1, M¨uller’s algorithm requires one exponentiation (Step 1) and five multiplica- tions (two multiplications in Step 2, one multiplication in Step 3 and two multiplications in Step 4). In case(2a)p−14 = 1, M¨uller’s algorithm, besides the operations in Steps 1-3, requires one more exponen- tiation (Step 8) and eight more multiplications (one multiplication in Step 8, four multiplications in Step 9 and three multiplications in Step 10. Additionally, Step 7 requires, on average, two quadratic character evaluations (generate randomlyd ∈ Zp until

d p

= −1- because half of the elements are quadratic non-residues, two generations are required on average). It is interesting to remark that Ankeny [1] has proven that, by assuming the Extended Riemann Hypothesis (ERH), the least quadratic non-residue mod- ulopis inO((log2p)2). As a consequence, in this case, the presented probabilistic algorithm for finding a quadratic non-residue can be transformed into a deterministic polynomial time algorithm of complexity O((log2p)4).

4. A Complete Generalization of Atkin’s Square Root Algorithm

In this section we extend Atkin’s algorithm to the case p ≡ 2s + 1 mod 2s+1,for any s ≥ 2,thus providing a complete solution for the casep ≡ 1mod4. For any primep,withp ≡ 1mod4,we can express p−1asp−1 = 2st, wheres ≥2 andtis odd. If we writetast = 2t + 1, we obtain that p= 2s+1t+ 2s+ 1that implies thatp≡2s+ 1mod2s+1.

(5)

M ¨uller’s Algorithm(p,a)

input: pprime such thatp≡9mod16, a∈Zpa quadratic residue;

output: b, a square root ofamodulop;

begin

1. α:= (2a)p16−9; 2. β:= 2aα2; 3. ifβ2 =−1

4. then b:=αa(β−1);

5. else

6. begin

7. generated,a quadratic non-residue modulop;

8. α:=αdp−98 ;

9. β:= 2ad2α2; 10. b:=αad(β−1);

11. end

12. returnb end.

Figure 2: M¨uller’s algorithm

We will express√ aas√

a=αa(β−1)dnorm,whereβ2 =−1, dis a quadratic non-residue modulo p, norm≥0,and2ad2·normα2 =β.Indeed, in this case,(αa(β−1)dnorm)2 =a(−2ad2·normα2β) = a(−β2) = a.Moreover, in order to easily determine α, it will be convenient that β has the formβ = (2ad2·norm)k, withkodd.

The key point of our generalization is

Base Case:(2ad2·norm)2p−1s−1 =−1,for somenorm≥0.

In this case, because p−12s is odd, we can chooseβasβ= (2ad2·norm)p−12s , αas α= (2ad2·norm)

p−1 2s −1

2 = (2ad2·norm)p−(2

s+1)

2s+1 = (2ad2·norm)t−12 .

In contrast to M¨uller’s impractical attempt of further generalizing Atkin’s approach ([15, Remark 2]), we focus on finding an adequate value fornorm, the exponent ofdsuch that theBase Caseis satisfied.

In order to derive the value ofnorm, we use the following results:

Theorem 4.1. Letpbe an odd prime,p−1 = 2st(s≥3,todd),aa quadratic residue modulop,andd a quadratic non-residue modulop. Then, for all1≤i≤s−1, the following statement holds

(∃norm∈N)((2ad2·norm)p−12i = 1)⇒(∃norm∈N)((2ad2·norm)2p−1s−1 =−1)

(6)

Proof:

We use induction oni.

Initial Case- Fori= s−1the reasoning is very simple. If there is a positive integernorm such that(2ad2·norm)2p−1s−1 = 1then, using that dp−12 = −1 (or,(d2s−2)2p−1s−1 = −1), we obtain that (2ad2·normd2s−2)2p−1s−1 = −1,and, furthermore, (2ad2·(norm+2s−3))2p−1s−1 = −1. Thus, we may choosenorm=norm+ 2s−3.

Inductive Case- Let us consider an arbitrary number i,1 ≤ i < s−1. We assume that the statement holds for the casei+ 1and we will prove it for the casei.

If there is a natural numbernorm such that(2ad2·norm)p−12i = 1, or, ((2ad2·norm)2p−1i+1)2 = 1, then(2ad2·norm)2p−1i+1 =±1. We have two cases:

– If(2ad2·norm)2p−1i+1 = 1then, using the inductive hypothesis, we directly obtain that(∃norm∈ N)((2ad2·norm)2p−1s−1 =−1);

– If (2ad2·norm)2p−1i+1 = −1then, using thatdp−12 = −1(or, equivalently, (d2i)2p−1i+1 = −1) we obtain that (2ad2·normd2i)2p−1i+1 = 1,and, furthermore, (2ad2·(norm+2i−1))2p−1i+1 = 1.

Finally, using the inductive hypothesis, we obtain that the required statement holds. ⊓⊔ The previous theorem leads to the following:

Corollary 4.2. Letpbe an odd prime,p−1 = 2st(s≥2,todd),aa quadratic residue modulop,and da quadratic non-residue modulop. Then there existsnorm∈Nsuch that

(2ad2·norm)2p−1s−1 =−1.

Proof:

Fors= 2, we obtain directlynorm= 0,because in this case2is a quadratic non-residue modulopand the relation(2a)p−12 =−1holds.

Fors≥ 3,2is a quadratic residue and thus, we have(2a)p−12 = 1. Using Theorem 4.1, fori = 1 (norm= 0) we obtain that there isnorm∈Nsuch that

(2ad2·norm)2p−1s−1 =−1. ⊓⊔

Therefore, all other possible cases can be recursively reduced to theBase Caseas presented above.

In order to further clarify the points made so far, we will now give an algorithmic description of our generalization. We will use a special subroutine namedFindPlace(presented in Figure 3), in which, starting with certain values foraandnormthat satisfy(2ad2·norm)p2−1i = 1,for somei,we will search for a placejas close as possible tos−1such thattemp= (2ad2·norm)p2−1j =±1.

Furthermore, we will also formulateBase Caseas a subroutine in Figure 4.

Finally, the main part of our algorithm is presented in Figure 5.

(7)

FindPlace(a, norm)

begin

1. ifnorm= 0thentemp:= (2a)t 2. elsetemp:= (2ad2·norm)t; 3. j:=s;

4. repeat 5. j:=j−1;

6. temp:=temp2;

7. until(temp= 1∨temp=−1) 8. return(j, temp)

end.

Figure 3:FindPlaceSubroutine BaseCase(a, norm)

begin

1. α:= (2ad2·norm)t−12 ; 2. β:= (2ad2·norm2; 3. b:=αa(β −1)dnorm; 4. returnb

end.

Figure 4:BaseCaseSubroutine

Remark 4.3. For the clarity of the presentation, we believe it is also necessary to make some comments and prove some statements on the Generalized Atkin Algorithm and its subroutines:

1. The variablenormcontains the current value of the normalization exponent.

2. Some useful properties of the subroutineFindPlaceare presented next:

(a) If the outputted valuejof the subroutine FindPlaceis not equal tos−1, then the corre- sponding valuetempwill be−1.

Proof:

Becausej < s−1then at least two iterations ofrepeat untilhave been performed (because initially j = sand thenj is decremented in each iteration). If we assume by contradiction that the final value oftempis1,then the previous valuetempsatisfiestemp=±1(becausetemp= temp2 in Step 6), and, thus, the algorithm had to terminate at the previous iteration. ⊓⊔ (b) Let (j, temp) and (j, temp) be the outputs of two consecutive calls of the subroutine

FindPlace. Thenj < j.

(8)

Generalized Atkin Algorithm(p,a)

input: pprime such thatp≡1mod4 a∈Zpa quadratic residue;

output: b, a square root ofamodulop;

begin

1. determines≥2andtodd such thatp−1 = 2st;

2. generated, a quadratic non-residue modulop;

3. norm:= 0;

4. (j, temp) :=FindPlace(a, norm);

5. while(j < s−1) 6. begin

7. norm:=norm+ 2j−2;

8. (j, temp) :=FindPlace(a, norm);

9. end

10. if(temp=−1)then BaseCase(a, norm) 11. if(temp= 1)then

12. begin

13. norm:=norm+ 2s−3; 14. BaseCase(a, norm);

15. end

end.

Figure 5: Generalized Atkin Algorithm

Proof:

Let us first point out thatj < s−1(otherwise, ifj = s−1,there will not be another call of FindPlace, since the algorithm will end with a call ofBaseCase), which implies thatj+1≤s−1.

Therefore, we obtaintemp= (2ad2·norm)p−12j =−1. Furthermore, we have(2ad2·norm)p−12j = 1, which implies that(2ad2·norm)2p−1j+1 =±1, leading toj+1≤j(becausejis the greatest element less thans−1such that(2ad2·norm)

p−1

2j′ =±1). ⊓⊔

3. Ifp ≡5mod8, i.e.,s= 2,thenFindPlacewill be called exactly once (withaand norm= 0) and it will outputj = s−1 = 1andtemp = −1- in this case, the subroutine BaseCasewill directly lead to the final result (no normalization is required). Thus, we have obtained Atkin’s algorithm as a particular case of our algorithm.

4. Ifp≡9mod16, i.e.,s= 3,thenFindPlacewill be called exactly once (withaandnorm= 0) and it will outputj =s−1 = 2andtemp=±1. Two subcases are possible:

• In casetemp= −1, the subroutineBaseCasewill lead directly to the final result (no nor- malization is required);

(9)

• In casetemp= 1, the normalization exponent will be updated asnorm= 0 + 23−3 = 1and the subroutine BaseCasewill be called. Consequently, the final result will be computed as b:=αa(β−1)d1(Step 3 ofBaseCase).

Thus, we have obtained M¨uller’s algorithm as a particular case of our algorithm.

5. Efficient Implementation and Performance Analysis

We start with the average-case and worst-case complexity analysis of our initial algorithm and then we discuss several improvements for efficient implementation. Finally we present several comparisons with the most important generic square root computing methods, namely Tonelli-Shanks and Cippola-Lehmer.

5.1. Average-Case and Worst-Case Complexity Analysis

We will consider the casess ≥ 4 (fors = 2, s = 3,we obtain, Atkin’s algorithm, and, respectively, M ¨uller’s algorithm, whose complexities have been discussed in Section 3). Our algorithm determines the value ofnormby calling the subroutineFindPlacefor each1digit in the binary expression ofnorm.

Therefore, the algorithm makesHw(norm)calls toFindPlace, where Hw(x)denotes the Hamming weight ofx(i.e., the number of1’s inx).

LetEdenote one exponentiation, M- one multiplication, and S- one squaring (all these operations are performed modulop). Our subroutines will involve:

• FindPlace- if the output is(j, temp)then at most 2E+1M+(s−j)S;

• BaseCase- at most 3E+6M+1S.

We exclude the complexity of generating a quadratic non-residued. All the other computations can be considered negligible (ifnormis represented in base2then the stepnorm:=norm+ 2j−2implies only setting a certain bit to 1). In the average case, we have Hw(norm) = s−22 , which means that our algorithm will include s−22 calls toFindPlaceand a call toBaseCase. Thus, the total number of operations is, on average, the following:

s−2

2 (2E+ 1M) +

Ps−1 j=2(s−j)

2 S+ 3E+ 6M+ 1S =

(s−2)E+(s−2)2 M+(s−1)(s−2)4 S+ 3E+ 6M+ 1S = (s+ 1)E+s+102 M+s2−3s+64 S

In contrast, in the worst case,normwill haveHw(norm) =s−2(i.e., all the bits fromnormwill be equal to 1), resulting in(s−2) calls toFindPlaceand a call to BaseCase. The total number of operations now becomes the following:

(s−2)(2E+ 1M) +Ps−1

j=2(s−j)S+ 3E+ 6M+ 1S = 2(s−2)E+ (s−2)M+(s−1)(s−2)2 S+ 3E+ 6M+ 1S = (2s−1)E+ (s+ 4)M+s2−3s+42 S

Once more, we do not count the generation of a quadratic non-residue d. Consequently, both the average-case and the worst-case complexity of our initial algorithm are inO((log2p)4).

(10)

5.2. More Efficient Implementation

It is obvious that several steps (especially the steps that involves exponentiations) of our algorithm can be performed much more efficiently compared to their raw implementation. A first solution is to precompute several powers ofd,to keep track ofdnormand(dnorm)t−12 asnormis updated and efficiently recompute the value oftempfrom Step 2 ofFindPlaceby using the previous values. Moreover, in this case, the final exponentiations (fromBaseCase) can also be performed efficiently.

We will now examine the computations behind our algorithm more closely, in order to point out possible improvements, if precomputations can be afforded. We begin by defining the elements Dj, 0 ≤ j ≤ s−1, Dj = D2j,D = dt andAj, 0 ≤ j ≤ s−2, Aj = A2j,A = (2a)t. Let us denote Aj ·Dtemp normby < j, temp norm >, where temp normis in binary form. In our algorithm, we determine the value of norm = (fs−3...f0)2 by successively computing the digits f0,f1 ,...,fs−3, so that:

Ts−2 =< s−2, f0

s−1 0s

z }| {

00...00>= 1

Ts−3 =< s−3, f1f000...00

| {z }

s−2 0s

>= 1 ...

T2 =<2, fs−4...f1f0000>= 1

T1 =<1, fs−3...f1f000>=−1

We remark that the last element T1 is exactly the element from theBase Case: (2ad2·norm)2p−1s−1. The reader will notice that for any0 ≤ j ≤ s−2, we would compute Tj = Aj ·D(fs−2−j...f1f00...0)2 in a naive manner, by multiplyingAj with all the(2i)th powers ofDcorresponding to the1bits from fs−2−j...f1f00...0. In order to reduce the number of modular multiplications, let us choose a fixed, small integerk, withk ≥1, and consider thektermsTj, Tj−1, ..., Tj−k+1,wherej−k+ 1 >1. We obtain the following sequence:

Tj =< j, fs−2−jfs−3−j...f1f00...0

| {z }

j+1 0s

> = 1

Tj−1 =< j−1, fs−2−j+1fs−2−jfs−3−j...f1f00...0

| {z }

j0s

> = 1 ...

Tj−k+2 =< j−k+ 2, fs−2−j+k−2...fs−2−jfs−3−j...f1f0 0...0

| {z }

j−k+3 0s

> = 1

Tj−k+1 =< j−k+ 1, fs−2−j+k−1...fs−2−jfs−3−j...f1f0 0...0

| {z }

j−k+2 0s

> = 1

(11)

Importantly, the sequencefs−3−j...f1f0 appears in all of the above terms. Let us denote the term Dfs−3−j...f1f0

jk+2 0s

z }| {

00...00byaux.

We notice that theTj, Tj−1, ..., Tj−k+1 can be computed in the following way, once we have deter- minedTj+1(and, implicitly,fs−3−j, ...., f1, f0):

Tj =Aj·Ds−1fs−2−j

| {z }

S0

·aux2k−1

Tj−1 =Aj−1·Dfs−1s−2−j+1

| {z }

S1

·Ds−2fs−2−j

| {z }

S0

·aux2k−2 ...

Tj−k+2=Aj−k+2·Dfs−1s−2−j+k−2

| {z }

Sk−2

· · · ·Dfs−k+2s−2−j+1

| {z }

S1

·Ds−k+1fs−2−j

| {z }

S0

·aux2

Tj−k+1=Aj−k+1·Dfs−1s−2−j+k−1

| {z }

Sk−1

·Dfs−2s−2−j+k−2

| {z }

Sk−2

· · · ·Dfs−k+1s−2−j+1

| {z }

S1

·Ds−kfs−2−j

| {z }

S0

·aux

We added underbraces with subscripts to the terms in order to highlight the fact that it is useful to see terms which have the same exponent as being part of a larger set. We will show how we can efficiently generate the powers ofauxand the setsSw, for0≤w≤k−1.

Firstly, we computeauxin the regular manner, and thenaux2i, for1 ≤ i≤ k−1, throughk−1 modular squarings. This way, we use onlyk−1modular squarings, instead of(k−1)·Hw(fs−3−j...f1f0) regular modular multiplications.

Secondly, we still have to compute the sets of termsSw = {Ds−k+w+zfs−2−j+w |0 ≤ z ≤ k−1−w}, for0 ≤ w ≤ k−1. Intuitively, the setSw contains all the terms located wpositions below the main diagonal, for0 ≤ w ≤ k−1. For eachw, iffs−2−j+w = 0(as Ts−2−j+w = 1), we do not have to compute anything because Sw = {1}, while iffs−2−j+w = 1(as Ts−2−j+w = −1), each set can be easily generated by takingDs−k+wand applyingk−w−1modular squarings.

Inner Loop 1. computeaux

2. setTj−k+1:=Aj−k+1·aux;

3. forw= 0tok−1do:

4. determinefs−2−j+w

5. updateTj−k+1by settingTj−k+1:=Tj−k+1·Ds−k+w 6. fori= 2tok−w−1do:

7. updateTj−k+iby settingTj−k+i:=Tj−k+i−12 Figure 6: Inner Loop

(12)

Thirdly, by storing the termsTj, Tj−1, ..., Tj−k+1we can efficiently combine the two aforementioned improvements as presented in Figure 6.

For each value ofwbetween0andk−1, running the inner loop generates both the necessary powers ofaux and the set Sw. Once the outer loop is completed, we have determined knew digits from the binary representation ofnorm. We repeat this procedure until we know all the bits ofnorm.

Finally, we can also simplify the last computations of our initial algorithm. The standard procedure would be to first calculate the termsα= (2ad2·norm)t−12 andβ = (2ad2·norm2, and then to generate the square rootb=αa(β−1)dnorm. However, if we elaborate the expression ofbwe obtain

b = aαdnorm(β−1)

= a(2a)t−12 d2·normt−12 dnorm((2a)t(d2·norm)t−1)

= a(2a)t−12 dt·norm((2a)t(dt·norm)2−1)

Once we have computed(2a)t−12 , we can then easily modify the final run of the inner loop in order to generateDnorm = (dt)normand to compute the value ofb. When combined, our suggestions lead to a significantly improved version of our initial algorithm. The precomputation stage is as follows:

Precomputation(p,a,k)

input: pprime such thatp≡1mod4,p−1 = 2st, todd;

k,1≤k≤s, a precomputation parameter;

output: Dj,0≤j≤s−1,Dj =D2j,D=dt,dquadratic non-residue, auxA= (2a)t−12 ,A= (2a)tandAs−1−k·i,1≤i≤q,

whereq =⌊s−2k ⌋,As−1−k·i = (2a)t2s−1−k·i begin

1. generate and stored(by any means available);

2. compute and storeDandDi,0≤j≤s−1 (by square-and-multiply exponentiation);

3. compute and storeauxA,AandAs−1−k·i,1≤i≤q (by square-and-multiply exponentiation);

end

Figure 7:PrecomputationSubroutine

We have precomputed all the required powers ofD, but only certain powers ofA. It is not necessary to keep all the powers of A, since the missing powers can be generated as they are needed. This is because the algorithm behind the third improvement uses only one stored power of A and implicitly employsk−1other powers ofA, which are kept only for the duration of the outer loop. We have also computed the termauxA= (2a)t−12 , which is part of the final improvement. Moreover, we assume that we have enough memory capacity to storeknumbers, namelyACCh, where1≤h≤k. These numbers are exactlyTj, Tj−1, ..., Tj−k+1, as used in the description ofInner Loop.

(13)

The main part of our improved algorithm is presented in Figure 8. Before running the actual algo- rithm, thePrecomputationsubroutine must be called. Note, however, that ifpis a priori known, Steps 1 and 2 from Precomputationneed to be performed only once (and this may be done in advance), while Step 3 must be repeated for eacha.

Improved Generalized Atkin Algorithm(p,a,k)

input: pprime such thatp≡1mod4, p−1 = 2st, todd;

a∈Zpa quadratic residue;

output: b, a square root ofamodulop;

begin

1. step:=s−2;

2. auxnorm := 0 = (es−1...e0)2; (the finalauxnormis4·norm) 3. q:=⌊s−2k ⌋;

4. rem:= (s−2)mod k+ 1;

5. fori= 1toqdo 6. begin

7. Complete Accumulator Update(step,k);

8. Complete Inner Loop(step,k);

9. end

11. Final Accumulator Update and Inner Loop(step,k,rem);

12. b:=a·auxA·auxACC ·(A·aux2ACC −1);

13. returnb end.

Figure 8: Improved Generalized Atkin Algorithm

The first two subroutines correspond to the Inner Loop(described in Figure 6) in the following manner:

• Complete Accumulator Update(presented in Figure 9) implements Steps 1 and 2, computing auxandTj−k+1.

• Complete Inner Loop(presented in Figure 10) implements the loop in Steps 3 through 7, com- puting the bitsfs−2−j,. . . ,fs−3−j+k.

Final Accumulator Update and Inner Loop (presented in Figures 11, 12) is an incomplete combination of a Complete Accumulator Update and a Complete Inner Loop, for determining the remaining bits ofnorm, sinces−2may not be an exact multiple ofk. Moreover, a slight adjustment is made in order to obtain the term auxACC = Dnorm = (dt)norm. We consider the case s = 2 separately and setauxACC := 1since this case does not fit in the general framework. Fors > 2,the last part of this subroutine (Steps 15-34) computes the term T1 which must be treated individually, as T1=−1while all otherTi’s are equal to 1, for2≤i≤s−2.

(14)

Complete Accumulator Update(step,k) begin

1. ACC1 :=Astep−k+1

s−1Y

j=step+2

(Dj−k)ej; 2. forj = 2tokdoACCj :=ACCj−12 ;

end

Figure 9:Complete Accumulator UpdateSubroutine

Complete Inner Loop(step,k)

begin

1. forj =kdownto1do 2. begin

3. auxnorm:=auxnorm/2;

4. ifACCj=−1then

5. begin

6. ACC1:=ACC1·Ds−j;

7. forh= 2toj−1doACCh:=ACCh−12 ;

8. es−1 := 1;

9. end

10. step:=step−1;

11. end

end

Figure 10:Complete Inner LoopSubroutine

(15)

Final Accumulator Update and Inner Loop(step,k,rem) begin

1. auxACC :=

s−1Y

j=step+2

(Dj−step−2)ej; 2. ACC1:=A·aux2ACC;

3. forj= 2toremdoACCj :=ACCj−12 ; 4 forj=rem−1downto2do

5. begin

6. auxnorm:=auxnorm/2;

7. ifACCj =−1then

8. begin

9. auxACC :=auxACC ·Ds−1−j; 10. ACC1:=A·aux2ACC;

11. forh= 2toj−1doACCh:=ACCh−12 ;

12. es−1 := 1;

13. end

14. end

15. ifrem= 1then

16. ifs= 2thenauxACC := 1;

17. else

18. ifes−1 = 0then

19. begin

20. auxACC :=auxACC ·Ds−3;

21. es−1 := 1;

22. end

23. else begin

24. auxACC :=auxACC ·Ds−3·Ds−2·Ds−1;

25. es−1 := 0;

26. end

Figure 11:Final Accumulator Update and Inner LoopSubroutine

(16)

27. else begin

28. auxnorm :=auxnorm/2;

29. ifACC2 = 1then

30. begin

31. auxACC :=auxACC·Ds−3;

32. es−1 := 1;

33. end

34. end end

Figure 12:Final Accumulator Update and Inner LoopSubroutine (continued) Example 5.1 illustrates the application of our improved algorithm.

Example 5.1. Let us considerp = 12289 (s = 12,t = 3) anda = 2564(2564is a quadratic residue modulo12289). We choosed= 19,k= 3(therefore,q = 3), and obtain the following values :

i 0 1 2 3 4 5

Di 6859 3589 2049 7852 12280 81

Ai 8835 9786 9908 3932 1062 9545

i 6 7 8 9 10 11

Di 6561 10643 5736 4043 1479 12288

Ai 8668 11567 5146 10810 12288 -

However, we will only store theDi’s, for0≤i≤11, as well asA0,A2,A5,A8andauxA= 5128.

We obtainstep= 10,auxnorm= 0andrem= 2.

Fori = 1, we update the accumulators so thatACC1 = A8 = 5164, ACC2 = A9 = 10810and ACC3=A10=−1.

Entering theComplete Inner Loop, we have:

• since ACC3 = −1, we haveACC1 = ACC1·D9 = 1and ACC2 = ACC12 = 1. Moreover, e11 = 1,auxnorm = 0/2 + 2048 = 2048andstep= 9;

• sinceACC2= 1, we getauxnorm = 2048/2 = 1024andstep= 8;

• sinceACC1= 1, we getauxnorm = 1024/2 = 512andstep= 7;

Fori= 2, we update the accumulators so thatACC1 = 1,ACC2 = 1andACC3 = 1.

Entering theComplete Inner Loop, we have:

(17)

• sinceACC3= 1, we getauxnorm= 512/2 = 256andstep= 6;

• sinceACC2= 1, we getauxnorm= 256/2 = 128andstep= 5;

• sinceACC1= 1, we getauxnorm= 128/2 = 64andstep= 4;

Fori= 3, we update the accumulators so thatACC1= 8246,ACC2 = 1479andACC3 =−1.

Entering theInner Loop, we have:

• since ACC3 = −1, we have ACC1 = ACC1 ·D9 = 10810 and ACC2 = ACC12 = −1.

Furthermore,e11= 1,auxnorm= 64/2 + 2048 = 2080andstep= 3;

• since ACC2 = −1, we have ACC1 = ACC1·D10 = 1. Furthermore, e11 = 1, auxnorm = 2080/2 + 2048 = 3088andstep= 2;

• sinceACC1= 1, we getauxnorm= 3088/2 = 1544andstep= 1;

We now perform theFinal Accumulator Update and Inner Loop. Thus, we obtainauxACC = D0·D6·D7 = 1490,ACC1 =A·aux2ACC =−1andACC2 =ACC12 = 1. SinceACC2 = 1, we obtaine11 = 1, auxnorm = 1544/2 + 2048 = 2820 (thus, norm = 2820/4 = 705) andauxACC = auxACC ·D9 = 2460. The final computation gives usb= a·auxA·auxACC ·(A·aux2ACC −1) = 2564·5128·2460·(10810−1) = 253.

5.3. Average-Case and Worst-Case Complexity Analysis for the Improved Algorithm In the average case, we obtain the following complexities, based on the fact thatnormhas arounds/2 bits equal to1in its representation:

• Ifpis a priori known,Precomputationtakes1Efor the terms involvingA(the computation of the terms involvingDcan be performed in advance). Ifpis not a priori known,Precomputation takes2E, which means1Efor terms involvingDand1Efor the terms involvingA.

• Complete Accumulator Updatetakes4sMand(k−1)S(since, on average, we uses/2bits from norm, either of which can be0or1, with equal probability).

• Complete Inner Looptakes k2M+ k(k−1)4 S(since we usekbits fromnorm, either of which can be0or1, with equal probability).

• Final Accumulator Update and Inner Looptakes2·k2M+ s4M+(k−1)S+ k2M+ k(k−1)4 S.

• The final computation ofbtakes4M+1S.

The estimate does not include the generation of a quadratic non-residue d. In general, the compu- tation takes about2E+ sk (s4M+kS+ k2M+ k(k−4 1)S) +(k+ 4)M+1S. This value is around2E+ 3s4S+

s

2M+ 14(sk2)M+ 14(sk)S+ kM. Taking k = ⌈√

s⌉(the optimal choice) leaves us with2E+ 3s4S+ 2sM+

1

4(s⌈√s⌉)M+ 14(s⌈√s⌉)S+⌈√s⌉M.

(18)

Ifpis a priori known, we obtain1E+ 3s4S+ s2M+ 14(s⌈√

s⌉)M+ 14(s⌈√

s⌉)S+⌈√

s⌉M. In this case, we needsprecomputed elements and memory for just2⌈√

s⌉additional elements.

Moving on to the worst case, we consider the fact that norm’s binary representation has roughlys bits which are equal to1. This results in the following complexities:

• Precomputation- same as for the average case.

• Complete Accumulator Updatetakes 2sMand(k−1)S(since, on average, we uses/2bits from norm, and all ofnorm’s bits are equal to1).

• Complete Inner LooptakeskM+ k(k−1)2 S(since we usekbits fromnorm, and all ofnorm’s bits are equal to1).

• Final Accumulator Update and Inner Looptakes2kM+ s2M+(k−1)S+kM+ k(k−1)2 S.

• The final computation ofbtakes4M+1S- same as for the average case.

Again, we exclude the generation of a quadratic non-residue d. The computation takes at most about 2E + sk (s2M + kS+ kM+ k(k−1)2 S) + (2k + 4)M+ 1S. This value is approximately 2E+ 2sS+ sM+ 12(sk2)M + 12(sk)S+ 2kM. If we setk = ⌈√s⌉ (the optimal choice), we have 2E + 2sS + sM +

1 2(s⌈√

s⌉)M+ 12(s⌈√

s⌉)S+2⌈√

s⌉M. Ifpis a priori known, we obtain 2E+ s2S+sM+ 12(s⌈√ s⌉)M+

1 2(s⌈√

s⌉)S+2⌈√

s⌉M. Like in the average case, we will needsprecomputed elements and memory for just2⌈√s⌉additional eleme nts. Consequently, both the average-case and the worst-case complexity of our improved algorithm are inO((log2p)3.5).

5.4. Comparisons with Other Methods

In this section we will compare our algorithm with the most important square root algorithms, namely Tonelli-Shanks and Cippola-Lehmer. After a short overview of these algorithms, we will put forward a computational comparison of the three algorithms.

5.4.1. Tonelli-Shanks Algorithm

The Tonelli-Shanks algorithm ([22], [20]) reduces the problem of computing a square root to another famous problem, namely the discrete logarithm problem - given a finite cyclic groupG,a generator α of it, and an arbitrary elementβ∈G, determine the uniquek,0≤k≤ |G| −1,such thatβ=αk. The elementkwill be referred to as the discrete logarithm ofβin baseα, denoted byk= logαβ.Although this problem is intractable, if the order of the group is smooth, i.e., its prime factors do not exceed a given bound, there is an efficient algorithm due to Pohlig and Hellman [16].

Let us consider an odd primep, p= 2st+ 1,withs≥2andtis odd,aa quadratic residue andda quadratic non-residue (modulop). Tonelli-Shanks algorithm is based on the following simple facts:

1. Letα = dt. Then| < α > | = 2s, or, equivalently, ord(α) = 2s, where< α >denotes the subgroup induced byα,andord(α)represents the order ofα(inZp).

2. Letβ=at. Thenβ ∈< α >andlogαβis even (this discrete logarithm is considered with respect to the subgroup induced byα).

(19)

Thus, if we can determineksuch that β = αk, then√

acan be computed as√

a = at+12 (d−1)k2t. Indeed,(at+12 (d−1)k2t)2 =at+1(dkt)−1 =at+1a−t=a.

Thus, the difficult part is findingk, the discrete logarithm ofβ in baseα(in the subgroup < α >

of order 2s). Tonnelli and Shanks compute the element k bit by bit. Lindhurst [14] has proven that Tonelli-Shanks algorithm requires on average two exponentiations, s42 multiplications, and two quadratic character evaluations, with the worst-case complexityO((log2p)4).

Bernstein [5] has proposed a method of computingw bits of kat a time. His algorithm involves an exponentiation and 2ws22 multiplications, with a precomputation phase that additionally requires two quadratic character evaluations on average, an exponentiation, and about2w swmultiplications, producing a table with2w sw precomputed powers ofα.

5.4.2. Cippola-Lehmer Algorithm

The following square root algorithm is due to Cipolla [6] and Lehmer [12]. Cipolla’s method is based on arithmetic in quadratic extension fields, which is briefly reminded below.

Let us consider an odd primepand aa quadratic residue modulop. We first generate an element z ∈Zp such thatz2 −ais a quadratic non-residue. The extension fieldZp(√

z2−a)is constructed as follows:

• its elements are pairs(x, y)∈Z2p;

• the addition is defined as(x, y) + (x, y) = (x+x, y+y);

• the multiplication is defined as(x, y)·(x, y) = (xx+yy(z2−a), xy+xy);

• the additive identity is(0,0),and the multiplicative identity is(1,0);

• the additive inverse of(x, y)is(−x,−y)and its multiplicative inverse is(x(x2−y2(z2−a))−1,

−y(x2−y2(z2−a))−1).

Cipolla has remarked that a square root ofacan be computed using that (z,1)p+12 = (√

a,0),

and his method requires two quadratic character evaluations on average and at most6 log2pmultiplica- tions ([7, page 96]).

Lehmer’s method is based on evaluating Lucas’ sequences. Let us consider the sequence (Vk)k≥0 defined byV0 = 2, V1 =z,andVk =zVk−1−aVk−2,for allk ≥2,wherez∈ Zpis generates such thatz2−4ais a quadratic non-residue. Lehmer has proved that

√a= 1 2Vp+1

2 ,

and his method requires two quadratic character evaluations on average and about4.5 log2pmultiplica- tions ([18]). M¨uller [15] has proposed an improved variant that requires only 2 log2pmultiplications, which will be referred to as the Improved Cipolla-Lehmer.

(20)

5.4.3. Tests Results

We have implemented Improved Generalized Atkin (Imp-Gen-Atk) and the fastest known algorithms, namely Tonelli-Shanks-Bernstein (Ton-Sha-Ber) and Improved Cipolla-Lehmer (Imp-Cip-Leh). For all pairs (log2p, s), log2p ∈ {128,256,512,1024}, s ∈ {4,8,16,log22p}, we have generated 32 pairs (p, a), whereais a quadratic residue modulo pand we have counted the average number of modular squarings and regular modular multiplications. We have considered two cases, depending whetherpis known a priori or not. We have not included the computation required for finding a quadratic non-residue modulop. For exponentiation we have considered the simplest method, namely the square-and-multiply exponentiation. In case of an exponent x, this method requires log2x squarings and Hw(x) regular multiplications.

For Improved Generalized Atkin we choose the optimal k = ⌈√

s⌉, requiring s+ 2⌈√

s⌉ stored values. For Tonelli-Shanks-Bernstein, given that the number of needed precomputed values iss2ww, in order to reach a number of elements comparable with ours, we choose the parameterw= 2(that leads to2selements). We have to remark that the performance of Improved Cipolla-Lehmer does not depend ons.

We present the results for the case that p is not known a priori in Tables 1-4. In each column the first value indicates the average number of squarings and the second one denotes the average number of regular multiplications.

Method

log2p 128 256 512 1024

Imp-Gen-Atk 256 / 138 512 / 262 1024 / 515 2048 / 1036 Ton-Sha-Ber 255 / 146 511 / 300 1023 / 562 2047 / 1076 Imp-Cip-Leh 126 / 124 254 / 252 510 / 508 1022 / 1020

Table 1. Comparison between methods fors= 4, wherepis unknown

Method

log2p 128 256 512 1024

Imp-Gen-Atk 260 / 136 515 / 267 1027 / 521 2052 / 1031 Ton-Sha-Ber 255 / 179 511 / 300 1023 / 562 2047 / 1076 Imp-Cip-Leh 126 / 124 254 / 252 510 / 508 1022 / 1020

Table 2. Comparison between methods fors= 8, wherepis unknown

Method

log2p 128 256 512 1024

Imp-Gen-Atk 270 / 138 527 / 272 1039 / 527 2062 / 1041 Ton-Sha-Ber 255 / 284 511 / 412 1023 / 668 2047 / 1178 Imp-Cip-Leh 126 / 124 254 / 252 510 / 508 1022 / 1020

Table 3. Comparison between methods fors= 16, wherepis unknown

(21)

Method

log2p 128 256 512 1024

Imp-Gen-Atk 303 / 171 559 / 297 1070 / 551 2096 / 1059 Ton-Sha-Ber 253 / 301 509 / 426 1021 / 686 2045 / 1195 Imp-Cip-Leh 126 / 124 254 / 252 510 / 508 1022 / 1020

Table 4. Comparison between methods fors= 32, wherepis unknown

Method

log2p 128 256 512 1024

Imp-Gen-Atk 392 / 217 904 / 526 2084 / 1334 5096 / 3721 Ton-Sha-Ber 253 / 723 509 / 2468 1021 / 9021 2045 / 34431

Imp-Cip-Leh 126 / 124 254 / 252 510 / 508 1022 / 1020

Table 5. Comparison between methods fors=log2p

2 , wherepis unknown

In case thatpis not known a priori, Improved Cipolla-Lehmer is clearly the best, while our algorithm is comparable with Tonelli-Shanks-Bernstein.

We are interested in determining the values ofsfor which our algorithm is more efficient than Im- proved Cipolla-Lehmer and/or Tonelli-Shanks-Bernstein considering the case that pis known a priori.

We express 1Easlog2pS+ log22p−s M. To simplify the comparisons we no longer distinguish between squarings and regular multiplications.

More precisely, let us first determine s such that our algorithm is more efficient than Improved Cipolla-Lehmer in terms of total computation:

log2p+log2p−s 2 +3s

4 +s 2 +1

4(s⌈√

s⌉) +1 4(s⌈√

s⌉) +⌈√

s⌉<2 log2p

We obtain the following sequence of equivalent inequalities:

log2p+log2p−s

2 +s⌈√ s⌉ 2 + 5s

4 +⌈√

s⌉ < 2 log2p s⌈√

s⌉ 2 + 3s

4 +⌈√

s⌉ < log2p 2 s⌈√

s⌉+3s

2 + 2⌈√

s⌉ < log2p

We now turn our attention to Tonelli-Shanks-Bernstein with the parameterw= 2. A more thorough analysis of this algorithm gives uslog2p+log2p−s

2 +s2 8 + 3s

2 multiplications.

We obtain the following inequality:

s⌈√ s⌉ 2 +5s

4 +⌈√

s⌉ < s2 8 +3s

2 which leads tos >20.

(22)

We present the results for the case thatpis known a priori in Tables 5-8. We remind the reader that in each column the first value indicates the average number of squarings and the second one denotes the average number of regular multiplications.

Method

log2p 128 256 512 1024

Imp-Gen-Atk 128 / 72 256 / 133 512 / 269 1024 / 520 Ton-Sha-Ber 126 / 71 254 / 136 510 / 265 1022 / 522 Imp-Cip-Leh 126 / 124 254 / 252 510 / 508 1022 / 1020

Table 6. Comparison between methods fors= 4, wherepis known a priori

Method

log2p 128 256 512 1024

Imp-Gen-Atk 131 / 80 260 / 136 515 / 276 1028 / 522 Ton-Sha-Ber 127 / 112 255 / 176 511 / 305 1023 / 559 Imp-Cip-Leh 126 / 124 254 / 252 510 / 508 1022 / 1020

Table 7. Comparison between methods fors= 8, wherepis known a priori

Method

log2p 128 256 512 1024

Imp-Gen-Atk 141 / 79 270 / 155 526 / 262 1038 / 524 Ton-Sha-Ber 126 / 125 254 / 187 510 / 316 1022 / 574 Imp-Cip-Leh 126 / 124 254 / 252 510 / 508 1022 / 1020

Table 8. Comparison between methods fors= 16, wherepis known a priori

Method

log2p 128 256 512 1024

Imp-Gen-Atk 176 / 116 308 / 182 560 / 317 1072 / 560 Ton-Sha-Ber 126 / 246 254 / 310 510 / 441 1020 / 697 Imp-Cip-Leh 126 / 124 254 / 252 510 / 508 1022 / 1020

Table 9. Comparison between methods fors= 32, wherepis known a priori

(23)

Method

log2p 128 256 512 1024

Imp-Gen-Atk 268 / 186 649 / 453 1597 / 1257 4043 / 3425 Ton-Sha-Ber 126 / 686 254 / 2399 510 / 8895 1022 / 34172

Imp-Cip-Leh 126 / 124 254 / 252 510 / 508 1022 / 1020

Table 10. Comparison between methods fors= log2p

2 , wherepis known a priori

6. Conclusions and Future Work

In this paper we have extended Atkin’s algorithm to the general casep≡2s+1mod2s+1,for anys≥2, thus providing a complete solution for the casep≡1mod4. Complexity analysis and comparisons with other methods have also been provided.

An interesting problem is extending our algorithm to arbitrary finite fields. In the case of the finite fieldsGF(pk), forkodd, the efficient techniques described in [11], [9] can be adapted to our case in a straightforward manner, but, to the best of our knowledge, there are no similar techniques for the case GF(pk), forkeven. We will focus on this topic in our future work.

Acknowledgements

We would like to thank the two anonymous reviewers for their helpful suggestions.

References

[1] Ankeny, N. C.: The Least Quadratic Non Residue, Annals of Mathematics, 55(1), 1952, 65–72.

[2] Atkin, A.: Probabilistic primality testing (summary by F. Morain), Technical Report 1779, INRIA, 1992, URL:http://algo.inria.fr/seminars/sem91-92/atkin.pdf.

[3] Atkin, A., Morain, F.: Elliptic Curves and Primality Proving, Mathematics of Computation, 61(203), 1993, 29–68.

[4] Bach, E., Shallit, J.: Algorithmic Number Theory, Volume I: Efficient Algorithms, MIT Press, 1996.

[5] Bernstein, D. J.: Faster square roots in annoying finite fields (preprint), 2001, URL:http://cr.yp.to/papers/sqroot.pdf.

[6] Cipolla, M.: Un metodo per la risoluzione della congruenza di secondo grado, Rendiconto dell’Accademia delle Scienze Fisiche e Matematiche, Napoli, 9, 1903, 154–163.

[7] Crandall, R., Pomerance, C.: Prime Numbers. A Computational Perspective, Springer-Verlag, 2001.

[8] Eikenberry, S., Sorenson, J.: Efficient Algorithms for Computing the Jacobi Symbol, Journal of Symbolic Computation, 26(4), 1998, 509–523.

[9] Han, D.-G., Choi, D., Kim, H.: Improved Computation of Square Roots in Specific Finite Fields, IEEE Transactions on Computers, 58(2), 2009, 188–196.

Referințe

DOCUMENTE SIMILARE

Perhaps the last remaining field comprehended and used by mathematicians in all areas of the discipline, inequalities continue to play ¡¡n essential role in

rnct,hods (4.. Sayfv t\,, Addiliue l?Lurye-Iiulla lVlellrcds for Sliff DiffercnLial Equaliotts, ù,Iathcrnatics of Conrputation 40, no. Etrgland R., Sorne llgbrid

Another objective was to investigate the effect of anodization temperature on the current – time response, current density of anodization, AAO film thickness, activation

Herein, we reported on the synthesis of small crystals zeolite NaA by applying a two-step hydrothermal crystallization procedure, where the hydrogel was first treated at low

The as-prepared hybrid film exhibited a much higher photocatalytic activity for MB, MO and 4-CP degradation under visible light irradiation than that of SnO 2

In the first step, mesoporous silica (SBA-15 structure) was prepared using a soft template and in the second step, this material was used for impregnation with CeO 2

In this paper, more effort has been done to fabricate bulk dense BST ceramics from submicron down to nanometer size using the two-step sintering method, and the dielectric

Two-dimensional photonic structure has been imprinted on the surface of arsenic sulphide glass using the pulses of a femtosecond laser.. Due to the interaction of the laser beam with