Starting approximations for the iterative calculation of
square roots
By J. Eve
Several starting approximations are given which, in conjunction with a well-known iterative
process, lead to square root approximations, with a relative error in the range (2~ 3S , 2~4S), at
the expense of three divisions. More accurate approximations are given which require in addition
a single multiplication.
In square root subroutines based on the iterative process
X
A sufficiently close approximation to this minimal
solution can be obtained quite simply by considering
x
n + 1 = K*/l + y n ')
Lt xn = y*
z* = (1 + wz)(w + z ) - ' , where t~l < z < t. (6)
This approximation, which can be transformed to the
range (m, n) by z = y(mn)~i, has a relative error such
that
(7)
e(z~l) -\- e(z) = — e(z~')e(z).
a reduction in the number of iterations needed to achieve
a prescribed accuracy can be effected by expressing an
arbitrary operand, Y, in the form Y = y22n, where n is
chosen so that \ < y < 1, and then using y* = ^2".
The reduction in the number of iterations is simply a
consequence of the fact that as the range of allowed y
values decreases, increasingly accurate initial approximations, x0, are possible. A further improvement
results on splitting the range of y, i.e. using one approximation to represent x0 when ^ < y < i and another
when i < y < 1.
The relative errors, e^y) = xiyi — 1, of the approximations, xh generated by (1) satisfy
el+l = \e2(l +e,)-K
For a sufficiently small range (t~l, i) the right-hand side
of (7) is of the second degree in small quantities and
e(z~') == — e(z) which implies that
e(A) = Consequently the equations (5) will be approximately
satisfied for the range (/"', t) if w is chosen so that
e(B) = - e(t).
For the value t = 2i, a simple iterative process
sufficed to determine a value of w such that e(2~*),
-e(A), e(B) and -<?(2*) are in the range (323 ± 1) x 10- 6 .
For practical purposes the equations (5) are satisfied
by these values and there is no advantage in using the
more general form (3) in place of (6). The transformations z = 2iy and z = 2iy applied to (6), then provide
the approximations
(1)
This feature, together with the economy of arithmetic
operations involved in carrying out one cycle of the
iterative process, implies that there is little, if anything,
to be gained by using sophisticated approximations to
represent x0. The choice of possible forms to represent
x0 is therefore limited to
x0 = a 0
(2)
x0 = 1 • 797210 - l-710324(^ + 1 -068628)-', i < y < i
(8)
x0 = 2-541639 - 4-837528(j + 2-137255)-', * <y < 1
for which |e o | < 0-33 x 2-'°. Equation (1) shows that
two iterations yield an approximation with the error
bounds 0 < e2 < 1 X 2~ 47 .
(3)
c
o + c,y + c2y2
(4)
The bilinear approximation
Approximations of the form (3) have already been
given by Kogbetliantz (1959). It is believed, however,
that those given here are optimal; the relative errors are
about a quarter of those quoted by Kogbetliantz.
It is evident that the three parameters b0, bx and b2
in (3) may be chosen to make e0 vanish at three arbitrary
values of y. If these three points lie in some range {m, n),
then e0 has two stationary values in (m, «). Let these
be at the points y = A and y = B, where A < B. It
follows from a theorem of Chebyshev (Achieser, 1956)
that max \eo\ is a minimum over the range (jn,«), when
eo{m) = - eo(A) = eo(B) = - eo(n).
The linear approximations
In the case of linear approximations, of the form (2),
there can be only one stationary value, at y = A, and
the conditions determining the parameters a0 and ax
become
eo(m) = - eo(A) = eo(n).
(9)
The approximations satisfying these conditions are
(5)
274
x0 = 0-2951 +0-8346j,
i < y< i
x 0 = 0-4173 +0-5902.y,
i<y<\
Starting
approximations
7
where / = 0-4656 and 0 < ex <0-33 x 2~ 9 . After
three iterations, for (IT) and (11") the error bounds on
x3 are 0 < e3 < j- x 2~ 43 , while for (12'), 0 < e3 < i
x 2-".
where |e o | < 0-96 X 2~ . In many applications, after
two iterations using these initial approximations, the
accuracy obtained (0 < e2 < 2~31) will be inadequate,
while after three iterations the accuracy (0 < c3 < 2~63)
will be excessive. The possibility of taking advantage
of this situation by dispensing with some of the accuracy
in x3 in favour of an initial approximation which does
not involve a multiplication is examined by considering
the approximations
x0 = do +
*o
=
eo + y
The quadratic approximations
The best quadratic approximations (in the sense that
extremal values of the relative error are of equal
magnitude and of alternate sign) are
x0 = 0-221715 + 1-258997 j> - 0-578258y\
(11)
_ ^
x 0 = 0-313553 + 0-890245^ - 0-204445j'2,
4. < y < $
for which \eo\ < i x 2~9. As might be expected these
approximations are poorer than the bilinear approximations, and since the work involved in evaluation is
more than comparable, they are unlikely to be of
practical value.
As in the case of the linear approximations, a multiplication can be replaced by shift operations with some
loss of precision, if the coefficient of v 2 is suitably chosen.
In the approximations
In (11) and (12) the terms in y involve shifts rather than
a multiplication, which will result in a time saving on
many computers. In the approximations (12) the shifts
have been chosen so that the coefficients of y approximate those in (10); in (11), a crude but simple representation of the coefficients of y in (10) has been chosen.
Arbitrarily fixing one of the parameters in the general
linear form may or may not preclude one of the two
possible zeros of eo(y) from lying in the range (w, n).
In either case, intuitively it would appear that th6 best
solution would satisfy the condition that two of the three
quantities eo(m), eo(A) and eo(«) are equal in magnitude
and opposite in sign while the third is either less in
magnitude than the other two or, if larger in magnitude,
is the stationary value lying outside the range (m, n).
Since the work in calculating xt from the linear approximations is roughly comparable with that for calculating
x0 from the bilinear approximation, the error e, is quoted
for the linear approximation. Consequently the parameters in (10) and (11) were chosen to correspond with
those of the best approximation satisfying the conditions
that two of the quantities e,(m), et(A) and et(n) are
equal, while the third is either less in magnitude or, if
larger in magnitude, is e{(A) with A outside the range
(m, n).
Examination of the relative errors in the three cases
for each of the four approximations in (11) and (12)
leads to the choice
x0 = 0-2326 +y ,i<y<i,
0 < e, < 0-33 x2~ 9
x0 = 0-4753 +iy,i<y<l,
0 < e, < i x 2 - ' ° ( 1 1 )
xo = 0-27859+ y-iy,
x0 = 0 • 23077 + x(l • 20440
0<e,
x0 = 0-28841 + x(0-95967 - ix),
±<JC<1,
|eo|<ix2-«
and
x0 = 0-223437 + x(l- 248357 - \x - -frx),
i < x <i,
\eo\ <O-36 x 2 - 9
xQ = 0-305106 + x(0-912674 - ix + &x),
(14)
(15)
the coefficients of y2 were chosen to approximate those
in (13). The first of these, after two iterations, provides
sufficient accuracy for a computer with a short wordlength, and the second, while extremely accurate, is
rather cumbersome.
Conclusions
The frequency of use of square root subroutines is
usually used to justify a variety of programming tricks
which reduce the execution time. Therefore, once the
required accuracy has been specified, any remaining
choice between the various initial approximations will
be determined largely by the facilities available on a
particular computer. The other factor affecting the
choice is the amount of storage required; (11") requiring
storage for a single constant and (IT) and (12') which
require storage for two constants will be better than (8)
and (10) or any of the quadratic approximations.
Where accuracies in excess of 60 bits are required,
approximations (14) and (15) achieve this in three
iterations.
i<y<\,
< 0-2801 X2-' 2 .
For most purposes the approximations (11') can be
conveniently replaced by
= f+iy,
i<x < i,
\eo\
\ < y < I,
0 < ex <0-33x2- 12
* 0 = 0-43417 +i(y+iy),
(13)
(11")
275
Starting
approximations
To complete the discussion, approximations valid for
the whole range $ < x < 1 are given below:
The last approximation is very close to the best possible
quadratic approximation for the range £ < x < 1.
In applying any of these approximations a certain
amount of scaling will be necessary to prevent overflow
during the divisions; in the case of the linear approximations this would occur with y in the neighbourhood
of unity.
x0 = 2-18518 - 3-022890? + 1-54516)"1,
|e o | <0-33 x 2 - 7
xo = 0-3431 +0-6863;>,
\eo\ < 0-95 X 2~ 5
x 0 = 0-2601 + x(l 0481 — i x - TV*),
|eo|<ix2-«.
References
ACHIESER, N . I. (1956). Theory of Approximation, New York: Frederick Ungar.
KOGBETLIANTZ, E. G. (1959). "Computation of sinN, cos N and ^/N Using an Electronic Computer," I.B.M. Journal of
Research and Development, Vol. 3, p. 147.
Book Reviews
Elementary Mathematical Programming. By ROBERT W.
METZGER, 1963; 246 pages. (London and New York:
John Wiley & Sons Ltd., 15s.)
This elementary text is a paperback reprint of an edition which
appeared first in 1958. Its aim is limited to introducing linear
programming to the non-mathematical reader, with examples
from production planning, paper trim, blending and transport
scheduling.
An admirably lucid style permits more difficult material to
be introduced, such as multi-dimensional programming,
without danger of confusion to the newcomer. It includes
several approximate programming methods and relates them
to the more formal methods in a balanced way.
It is revealing that an introductory text dating from 1958
should still be adequate; subsequent innovations have not
yet reached the level of an elementary exposition, i.e. integer
programming, non-linearity, multi-dimensional models.
At its price this book compares well with other introductory
expositions of linear programming aimed at the nonmathematician.
D.
A. BRACE.
U.S.S.R. Computational Mathematics
and Mathematical
Physics Number 1,1962; 202 pages. (Oxford: Pergamon
Press, £50 per year, 6 issues).*
This is the first issue of an English translation of Zhurnal
VychisliteVnol Maternaliki I Matematicheskoi Fiziki, which
began publication in 1961 as the first journal in the U.S.S.R.
to be devoted specifically to numerical analysis and mathematical physics. Academician A. A. Dorodnitsyn is the
Chief Editor of the Russian journal, which is published by
the Department of Physico-Mathematical Sciences of the
U.S.S.R. Academy of Sciences. The version published by
Pergamon Press provides an English translation of all the
original papers in the Russian journal, and it is intended to
publish the translations within five months of the appearance
of the original paper (not necessarily in the same order of
publication). Dr. R. A. Buckingham, of the University of
London Computer Unit, is the Scientific Translation Editor.
This entire issue has been translated from Volume 1,
No. 1 of the Russian journal by Mrs. Ruth Feinstein, who
* The first issue announces quarterly publication at £25 per year,
but a pasted slip gives these modified rates, commencing 1963.
has also provided abstracts for all of the papers. Mrs.
Feinstein has coped admirably with the problem of translating
the papers into fluent mathematical English, and the only
emendations which I would propose are the following. (1)
Pp. 97ff, change "Whittaker's degenerate geometric function"
to "Whittaker's confluent hypergeometric function". (2)
Pp. 109ff, change "Airey" to "Airy". (3) P. 139, change
"motionless point" to "fixed point". (4) P. 186, change
"transform" to "transpose". (5) P. 187, change "reversible"
to "non-singular". The translated journal is neatly printed
by DRP (Warsaw): I noted only some unimportant misprints
on pp. 71 and 139.
This first issue contains translations of 13 papers, beginning
with a long article by A. N. Tikhonov and A. A. Samarskii
on "Homogeneous Difference Schemes". It is a summary
and a revision of a series of nine papers previously published
by them, devoted to the problem of producing in a uniform
manner systems of finite-difference equations approximating
to ordinary differential equations. Their analysis is very
general, paying particular attention to discontinuities in the
coefficients of the differential equation, the stability of the
solution of the system of finite-difference equations and the
accuracy with which it approximates to the solution of the
differential equation. N . S. Bakhvalov, in "An estimate of
the mean remainder term in quadrature formulae", investigates
the mathematical expectation of the error arising from
numerical integration (ordinary or multi-dimensional) of
functions whose />'th derivatives satisfy a Holder condition.
Tan Chzhen', in "A lattice method for the orthogonalization
of the solution of systems of simultaneous linear algebraic
equations with a large number of unknowns", pays particular
attention to the efficient use of magnetic tape in solving large
systems of linear equations by orthogonalization. V. I.
Ivanov's paper on "The asymptotic expansion of Green's
function for the diffraction of short waves by a paraboloid of
revolution (axisymmetric case)" derives expressions for the
Green's function in penumbral regions partially shaded by a
paraboloid, correcting earlier formulae by Klante.
In her paper on "The numerical solution of a non-stationary
filtration problem", V. F. Baklanovskaya discusses a numerical
method for solving the equation dw/(H = ~d2u2ft)x2, which
occurs in the study of filtration of fluids in porous media.
V. P. Maslov, in "The quasi-classical asymptotic solutions of
(continued on p. 286)
276
© Copyright 2026 Paperzz