# Nash - Compact Numerical Methods for Computers

Посмотреть архив целикомПросмотреть файл в отдельном окне: 6ac737c6151fa4a4f7c64d5c3aa00420.pdf

Home

Next

COMPACT NUMERICAL

METHODS

FOR COMPUTERS

linear algebra and

function minimisation

Second Edition

J C NASH

Adam Hilger, Bristol and New York

Copyright © 1979, 1990 J C Nash

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system

or transmitted in any form or by any means, electronic, mechanical, photocopying or

otherwise, without the prior permission of the publisher. Multiple copying is only permitted

under the terms of the agreement between the Committee of Vice-Chancellors and Principals

and the Copyright Licensing Agency.

British Library Cataloguing in Publication Data

Nash, J. C.

Compact numerical methods for computers: linear algebra

and function minimisation - 2nd ed.

1!. Numerical analysis. Applications of microcomputer &

minicomputer systems. Algorithms

I. Title

519.4

ISBN

ISBN

ISBN

ISBN

0-85274-318-1

0-85274-319-X (pbk)

0-7503-0036-1 (5?" IBM disc)

0-7503-0043-4 (3?" IBM disc)

Library of Congress Cataloging-in-Publication Data are available

First published, 1979

Reprinted, 1980

Second edition, 1990

Published under the Adam Hilger imprint by IOP Publishing Ltd

Techno House, Redcliffe Way, Bristol BSl 6NX, England

335 East 45th Street, New York, NY 10017-3483, USA

Filmset by Bath Typesetting Ltd, Bath, Avon

Printed in Great Britain by Page Bros (Norwich) Ltd

CONTENTS

ix

xi

Preface to the Second Edition

Preface to the First Edition

1. A STARTING POINT

1.1. Purpose and scope

1.2. Machine characteristics

1.3. Sources of programs

1.4. Programming languages used and structured programming

1.5. Choice of algorithms

1.6. A method for expressing algorithms

1.7. General notation

1.8. Software engineering issues

1

1

3

9

11

13

15

17

17

2. FORMAL PROBLEMS IN LINEAR ALGEBRA

2.1. Introduction

2.2. Simultaneous linear equations

2.3. The linear least-squares problem

2.4. The inverse and generalised inverse of a matrix

2.5. Decompositions of a matrix

2.6. The matrix eigenvalue problem

19

19

19

21

24

26

28

3. THE SINGULAR-VALUE DECOMPOSITION AND ITS USE

TO SOLVE LEAST-SQUARES PROBLEMS

3.1. Introduction

3.2. A singular-value decomposition algorithm

3.3. Orthogonalisation by plane rotations

3.4. A fine point

3.5. An alternative implementation of the singular-value decomposition

3.6. Using the singular-value decomposition to solve least-squares

problems

30

30

31

32

35

38

40

4. HANDLING LARGER PROBLEMS

4.1. Introduction

4.2. The Givens’ reduction

4.3. Extension to a singular-value decomposition

4.4. Some labour-saving devices

4.5. Related calculations

49

49

49

54

54

63

5. SOME COMMENTS ON THE FORMATION OF THE CROSSPRODUCTS MATRIX ATA

66

v

vi

Compact numerical methods for computers

6. LINEAR EQUATIONS-A DIRECT APPROACH

6.1. Introduction

6.2. Gauss elimination

6.3. Variations on the theme of Gauss elimination

6.4. Complex systems of equations

6.5. Methods for special matrices

72

72

72

80

82

83

7. THE CHOLESKI DECOMPOSITION

7.1. The Choleski decomposition

7.2. Extension of the Choleski decomposition to non-negative definite matrices

7.3. Some organisational details

84

84

8. THE SYMMETRIC POSITIVE DEFINITE MATRIX AGAIN

8.1. The Gauss-Jordan reduction

8.2. The Gauss-Jordan algorithm for the inverse of a symmetric

positive definite matrix

94

94

9. THE ALGEBRAIC EIGENVALUE PROBLEM

9.1. Introduction

9.2. The power method and inverse iteration

9.3. Some notes on the behaviour of inverse iteration

9.4. Eigensolutions of non-symmetric and complex matrices

10. REAL SYMMETRIC MATRICES

10.1. The eigensolutions of a real symmetric matrix

10.2. Extension to matrices which are not positive definite

10.3. The Jacobi algorithm for the eigensolutions of a real symmetric

matrix

10.4. Organisation of the Jacobi algorithm

10.5. A brief comparison of methods for the eigenproblem of a real

symmetric matrix

11. THE GENERALISED SYMMETRIC MATRIX EIGENVALUE

PROBLEM

86

90

97

102

102

102

108

110

119

119

121

126

128

133

135

142

12. OPTIMISATION AND NONLINEAR EQUATIONS

12.1. Formal problems in unconstrained optimisation and nonlinear

equations

12.2. Difficulties encountered in the solution of optimisation and

nonlinear-equation problems

142

13. ONE-DIMENSIONAL PROBLEMS

13.1. Introduction

13.2. The linear search problem

13.3. Real roots of functions of one variable

148

148

148

160

146

Contents

14. DIRECT SEARCH METHODS

14.1. The Nelder-Mead simplex search for the minimum of a

function of several parameters

14.2. Possible modifications of the Nelder-Mead algorithm

14.3. An axial search procedure

14.4. Other direct search methods

vii

168

168

172

178

182

15. DESCENT TO A MINIMUM I: VARIABLE METRIC

ALGORITHMS

15.1. Descent methods for minimisation

15.2. Variable metric algorithms

15.3. A choice of strategies

186

186

187

190

16. DESCENT TO A MINIMUM II: CONJUGATE GRADIENTS

16.1. Conjugate gradients methods

16.2. A particular conjugate gradients algorithm

17. MINIMISING A NONLINEAR SUM OF SQUARES

17.1. Introduction

17.2. Two methods

17.3. Hartley’s modification

17.4. Marquardt’s method

17.5. Critique and evaluation

17.6. Related methods

197

197

198

207

207

208

210

211

212

215

18. LEFT-OVERS

18.1. Introduction

18.2. Numerical approximation of derivatives

18.3. Constrained optimisation

18.4. A comparison of function minimisation and nonlinear leastsquares methods

218

218

218

221

19. THE CONJUGATE GRADIENTS METHOD APPLIED TO

PROBLEMS IN LINEAR ALGEBRA

19.1. Introduction

19.2. Solution of linear equations and least-squares problems by

conjugate gradients

19.3. Inverse iteration by algorithm 24

19.4. Eigensolutions by minimising the Rayleigh quotient

226

234

234

235

241

243

APPENDICES

1. Nine test matrices

2. List of algorithms

3. List of examples

4. Files on the software diskette

253

253

255

256

258

BIBLIOGRAPHY

263

INDEX

271

PREFACE TO THE SECOND EDITION

The first edition of this book was written between 1975 and 1977. It may come as a

surprise that the material is still remarkably useful and applicable in the solution of

numerical problems on computers. This is perhaps due to the interest of researchers

in the development of quite complicated computational methods which require

considerable computing power for their execution. More modest techniques have

received less time and effort of investigators. However, it has also been the case that

the algorithms presented in the first edition have proven to be reliable yet simple.

The need for simple, compact numerical methods continues, even as software

packages appear which relieve the user of the task of programming. Indeed, such

methods are needed to implement these packages. They are also important when

users want to perform a numerical task within their own programs.

The most obvious difference between this edition and its predecessor is that the

algorithms are presented in Turbo Pascal, to be precise, in a form which will operate

under Turbo Pascal 3.01a. I decided to use this form of presentation for the following

reasons:

(i) Pascal is quite similar to the Step-and-Description presentation of algorithms

used previously;

(ii) the codes can be typeset directly from the executable Pascal code, and the

very difficult job of proof-reading and correction avoided;

(iii) the Turbo Pascal environment is very widely available on microcomputer

systems, and a number of similar systems exist.

Section 1.6 and appendix 4 give some details about the codes and especially the

driver and support routines which provide examples of use.

The realization of this edition was not totally an individual effort. My research

work, of which this book represents a product, is supported in part by grants from

the Natural Sciences and Engineering Research Council of Canada. The Mathematics Department of the University of Queensland and the Applied Mathematics

Division of the New Zealand Department of Scientific and Industrial Research

provided generous hospitality during my 1987-88 sabbatical year, during which a

great part of the code revision was accomplished. Thanks are due to Mary WalkerSmith for reading early versions of the codes, to Maureen Clarke of IOP Publishing

Ltd for reminders and encouragement, and to the Faculty of Administration of the

University of Ottawa for use of a laser printer to prepare the program codes. Mary

Nash has been a colleague and partner for two decades, and her contribution to this

project in many readings, edits, and innumerable other tasks has been a large one.

In any work on computation, there are bound to be errors, or at least program

ix

x

Compact numerical methods for computers

structures which operate in unusual ways in certain computing environments. I

encourage users to report to me any such observations so that the methods may be

improved.

J. C. Nash

Ottawa, 12 June 1989

PREFACE TO THE FIRST EDITION

This book is designed to help people solve numerical problems. In particular, it is

directed to those who wish to solve numerical problems on ‘small’ computers, that

is, machines which have limited storage in their main memory for program and

data. This may be a programmable calculator-even a pocket model-or it may

be a subsystem of a monster computer. The algorithms that are presented in the

following pages have been used on machines such as a Hewlett-Packard 9825

programmable calculator and an IBM 370/168 with Floating Point Systems Array

Processor. That is to say, they are designed to be used anywhere that a problem

exists for them to attempt to solve. In some instances, the algorithms will not be

as efficient as others available for the job because they have been chosen and

developed to be ‘small’. However, I believe users will find them surprisingly

economical to employ because their size and/or simplicity reduces errors and

human costs compared with equivalent ‘larger’ programs.

Can this book be used as a text to teach numerical methods? I believe it can.

The subject areas covered are, principally, numerical linear algebra, function

minimisation and root-finding. Interpolation, quadrature and differential equations are largely ignored as they have not formed a significant part of my own

work experience. The instructor in numerical methods will find perhaps too few

examples and no exercises. However, I feel the examples which are presented

provide fertile ground for the development of many exercises. As much as

possible, I have tried to present examples from the real world. Thus the origins of

the mathematical problems are visible in order that readers may appreciate that

these are not merely interesting diversions for those with time and computers

available.

Errors in a book of this sort, especially in the algorithms, can depreciate its

value severely. I would very much appreciate hearing from anyone who discovers

faults and will do my best to respond to such queries by maintaining an errata

sheet. In addition to the inevitable typographical errors, my own included, I

anticipate that some practitioners will take exception to some of the choices I

have made with respect to algorithms, convergence criteria and organisation of

calculations. Out of such differences, I have usually managed to learn something

of value in improving my subsequent work, either by accepting new ideas or by

being reassured that what I was doing had been through some criticism and had

survived.

There are a number of people who deserve thanks for their contribution to this

book and who may not be mentioned explicitly in the text:

(i) in the United Kingdom, the many members of the Numerical Algorithms

Group, of the Numerical Optimization Centre and of various university departments with whom I discussed the ideas from which the algorithms have condensed;

xi

xii

Compact numerical methods for computers

(ii) in the United States, the members of the Applied Mathematics Division of the

Argonne National Laboratory who have taken such an interest in the algorithms,

and Stephen Nash who has pointed out a number of errors and faults; and

(iii) in Canada, the members of the Economics Branch of Agriculture Canada for

presenting me with such interesting problems to solve, Kevin Price for careful and

detailed criticism, Bob Henderson for trying out most of the algorithms, Richard

Wang for pointing out several errors in chapter 8, John Johns for trying (and

finding errors in) eigenvalue algorithms, and not least Mary Nash for a host of

corrections and improvements to the book as a whole.

It is a pleasure to acknowledge the very important roles of Neville Goodman

and Geoff Amor of Adam Hilger Ltd in the realisation of this book.

J. C. Nash

Ottawa, 22 December 1977

Previous Home Next

Chapter 1

A STARTING POINT

1.1. PURPOSE AND SCOPE

This monograph is written for the person who has to solve problems with (small)

computers. It is a handbook to help him or her obtain reliable answers to specific

questions, posed in a mathematical way, using limited computational resources.

To this end the solution methods proposed are presented not only as formulae but

also as algorithms, those recipes for solving problems which are more than merely

a list of the mathematical ingredients.

There has been an attempt throughout to give examples of each type of

calculation and in particular to give examples of cases which are prone to upset

the execution of algorithms. No doubt there are many gaps in the treatment

where the experience which is condensed into these pages has not been adequate

to guard against all the pitfalls that confront the problem solver. The process of

learning is continuous, as much for the teacher as the taught. Therefore, the user

of this work is advised to think for him/herself and to use his/her own knowledge and

familiarity of particular problems as much as possible. There is, after all, barely a

working career of experience with automatic computation and it should not seem

surprising that satisfactory methods do not exist as yet for many problems. Throughout the sections which follow, this underlying novelty of the art of solving numerical

problems by automatic algorithms finds expression in a conservative design policy.

Reliability is given priority over speed and, from the title of the work, space

requirements for both the programs and the data are kept low.

Despite this policy, it must be mentioned immediately and with some

emphasis that the algorithms may prove to be surprisingly efficient from a

cost-of-running point of view. In two separate cases where explicit comparisons

were made, programs using the algorithms presented in this book cost less to

run than their large-machine counterparts. Other tests of execution times for

algebraic eigenvalue problems, roots of a function of one variable and function

minimisation showed that the eigenvalue algorithms were by and large ‘slower’

than those recommended for use on large machines, while the other test problems

were solved with notable efficiency by the compact algorithms. That ‘small’

programs may be more frugal than larger, supposedly more efficient, ones based

on different algorithms to do the same job has at least some foundation in the way

today’s computers work.

Since the first edition of this work appeared, a large number and variety of

inexpensive computing machines have appeared. Often termed the ‘microcomputer

revolution’, the widespread availability of computing power in forms as diverse as

programmable calculators to desktop workstations has increased the need for

1

2

Compact numerical methods for computers

suitable software of all types. including numerical methods. The present work is

directed at the user who needs, for whatever reason, to program a numerical method

to solve a problem. While software packages and libraries exist to provide for the

solution of numerical problems, financial, administrative or other obstacles may

render their use impossible or inconvenient. For example, the programming tools

available on the chosen computer may not permit the packaged software to be used.

Firstly, most machines are controlled by operating systems which control (and

sometimes charge for) the usage of memory, storage, and other machine resources. In

both compilation (translation of the program into machine code) and execution, a

smaller program usually will make smaller demands on resources than a larger one.

On top of this, the time of compilation is usually related to the size of the source

code.

Secondly, once the program begins to execute, there are housekeeping operations

which must be taken care of:

(i) to keep programs and data belonging to one task or user separate from those

belonging to others in a time-sharing environment, and

(ii) to access the various parts of the program and data within the set of

resources allocated to a single user.

Studies conducted some years ago by Dr Maurice Cox of the UK National

Physical Laboratory showed that (ii) requires about 90% of the time a computer

spends with a typical scientific computation. Only about 10% of the effort goes to

actual arithmetic. This mix of activity will vary greatly with the machine and problem

under consideration. However, it is not unreasonable that a small program can use

simpler structures, such as address maps and decision tables, than a larger routine. It

is tempting to suggest that the computer may be able to perform useful work with a

small program while deciding what to do with a larger one. Gathering specific

evidence to support such conjectures requires the fairly tedious work of benchmarking. Moreover, the results of the exercise are only valid as long as the machine,

operating system, programming language translators and programs remain

unchanged. Where performance is critical, as in the case of real-time computations,

for example in air traffic control, then benchmarking will be worthwhile. In other

situations, it will suffice that programs operate correctly and sufficiently quickly that

the user is not inconvenienced.

This book is one of the very few to consider algorithms which have very low

storage requirements. The first edition appeared just as programmable calculators

and the first microcomputers were beginning to make their presence felt. These

brought to the user’s desk a quantum improvement in computational power.

Comparisons with the earliest digital computers showed that even a modest microcomputer was more powerful. It should be noted, however, that the programmer did

not have to handle all the details of arithmetic and data storage, as on the early

computers, thanks to the quick release of programming language translators. There

is unfortunately still a need to be vigilant for errors in the floating-point arithmetic

and the special function routines. Some aids to such checking are mentioned later in

§1.2.

Besides the motivation of cost savings or the desire to use an available and

A starting point

3

possibly under-utilised small computer, this work is directed to those who share

my philosophy that human beings are better able to comprehend and deal with

small programs and systems than large ones. That is to say, it is anticipated that

the costs involved in implementing, modifying and correcting a small program will

be lower for small algorithms than for large ones, though this comparison will

depend greatly on the structure of the algorithms. By way of illustration, I

implemented and tested the eigenvalue/vector algorithm (algorithm 13) in under

half an hour from a 10 character/second terminal in Aberystwyth using a Xerox

Sigma 9 computer in Birmingham. The elapsed time includes my instruction in the

use of the system which was of a type I had not previously encountered. I am

grateful to Mrs Lucy Tedd for showing me this system. Dr John Johns of the

Herzberg Institute of Astrophysics was able to obtain useful eigensolutions from

the same algorithm within two hours of delivery of a Hewlett-Packard 9825

programmable calculator. He later discovered a small error in the prototype of

the algorithm.

The topics covered in this work are numerical linear algebra and function

minimisation. Why not differential equations? Quite simply because I have had

very little experience with the numerical solution of differential equations except

by techniques using linear algebra or function minimisation. Within the two broad

areas, several subjects are given prominence. Linear equations are treated in

considerable detail with separate methods given for a number of special situations.

The algorithms given here are quite similar to those used on larger machines. The

algebraic eigenvalue problem is also treated quite extensively, and in this edition, a

method for complex matrices is included. Computing the eigensolutions of a general

square matrix is a problem with many inherent difficulties, but we shall not dwell on

these at length.

Constrained optimisation is still a topic where I would be happier to offer more

material, but general methods of sufficient simplicity to include in a handbook of this

sort have eluded my research efforts. In particular, the mathematical programming

problem is not treated here.

Since the aim has been to give a problem-solving person some tools with which

to work, the mathematical detail in the pages that follow has been mostly confined

to that required for explanatory purposes. No claim is made to rigour in any

‘proof’, though a sincere effort has been made to ensure that the statement of

theorems is correct and precise.

1.2. MACHINE CHARACTERISTICS

In the first edition, a ‘small computer’ was taken to have about 6000 characters of

main memory to hold both programs and data. This logical machine, which might be

a part of a larger physical computer, reflected the reality facing a quite large group of

users in the mid- to late-1970s.

A more up-to-date definition of ‘small computer’ could be stated, but it is not

really necessary. Users of this book are likely to be those scientists, engineers, and

statisticians who must, for reasons of financial or administrative necessity or

convenience, carry out their computations in environments where the programs

4

Compact numerical methods for computers

cannot be acquired simply and must, therefore, be written in-house. There are also a

number of portable computers now available. This text is being entered on a Tandy

Radio Shack TRS-80 Model 100, which is only the size of a large book and is

powered by four penlight batteries.

Users of the various types of machines mentioned above often do not have much

choice as to the programming tools available. On ‘borrowed’ computers, one has to

put up with the compilers and interpreters provided by the user who has paid for the

resource. On portables, the choices may be limited by the decisions of the manufacturer. In practice, I have, until recently, mostly programmed in BASIC, despite its

limitations, since it has at least been workable on most of the machines available to

me.

Another group of users of the material in this book is likely to be software

developers. Some scenarios which have occurred are:

—software is being developed in a particular computing environment (e.g. LISP

for artificial intelligence) and a calculation is required for which suitable off-the-shelf

routines are not available;

—standard routines exist but when linked into the package cause the executable

code to be too large for the intended disk storage or memory;

—standard routines exist, but the coding is incompatible with the compiler or

interpreter at hand.

It is worth mentioning that programming language standards have undergone

considerable development in the last decade. Nevertheless, the goal of portable

source codes of numerical methods is still only won by careful and conservative

programming practices.

Because of the emphasis on the needs of the user to program the methods, there is

considerable concern in this book to keep the length and complexity of the

algorithms to a minimum. Ideally, I like program codes for the algorithms to be no

longer than a page of typed material, and at worse, less than three pages. This makes

it feasible to implement the algorithms in a single work session. However, even this

level of effort is likely to be considered tedious and it is unnecessary if the code can be

provided in a suitable form. Here we provide source code in Turbo Pascal for the

algorithms in this book and for the driver and support routines to run them (under

Turbo Pascal version 3.01a).

The philosophy behind this book remains one of working with available tools

rather than complaining that better ones exist, albeit not easily accessible. This

should not lead to complacency in dealing with the machine but rather to an active

wariness of any and every feature of the system. A number of these can and should be

checked by using programming devices which force the system to reveal itself in spite

of the declarations in the manual(s). Others will have to be determined by exploring

every error possibility when a program fails to produce expected results. In most

cases programmer error is to blame, but I have encountered at least one system error

in each of the systems I have used seriously. For instance, trigonometric functions are

usually computed by power series approximation. However, these approximations

have validity over specified domains, usually [0, ? /2] or [0, ? /2] (see Abramowitz and

Stegun 1965, p 76). Thus the argument of the function must first be transformed to

A starting point

5

bring it into the appropriate range. For example

or

sin( ? – ? ) = sin

(1.1)

sin( ? /2 – ? ) = cos

(1.2)

Unless this range reduction is done very carefully the results may be quite

unexpected. On one system, hosted by a Data General NOVA, I have observed

that the sine of an angle near ? and the cosine of an angle near ?/2 were both

computed as unity instead of a small value, due to this type of error. Similarly, on

some early models of Hewlett- Packard pocket calculators, the rectangular-to-polar

coordinate transformation may give a vector 180° from the correct direction. (This

appears to have been corrected now.)

Testing the quality of the floating-point arithmetic and special functions is

technically difficult and tedious. However, some developments which aid the user

have been made by public-spirited scientists. Of these, I consider the most worthy

example to be PARANOIA, a program to examine the floating-point arithmetic

provided by a programming language translator. Devised originally by Professor W

Kahan of the University of California, Berkeley, it has been developed and distributed in a number of programming languages (Karpinski 1985). Its output is

didactic, so that one does not have to be a numerical analyst to interpret the results. I

have used the BASIC, FORTRAN, Pascal and C versions of PARANOIA, and have seen

reports of Modula-2 and ADA®† versions.

In the area of special functions, Cody and Waite (1980) have devised software to

both calculate and test a number of the commonly used transcendental functions

(sin, cos, tan, log, exp, sqrt, xy). The ELEFUNT testing software is available in their

book, written in FORTRAN. A considerable effort would be needed to translate it into

other programming languages.

An example from our own work is the program DUBLTEST, which is designed to

determine the precision to which the standard special functions in BASIC are

computed (Nash and Walker-Smith 1987). On the IBM PC, many versions of

Microsoft BASIC (GWBASIC, BASICA) would only compute such functions in single

precision, even if the variables involved as arguments or results were double

precision. For some nonlinear parameter estimation problems, the resulting low

precision results caused unsatisfactory operation of our software.

Since most algorithms are in some sense iterative, it is necessary that one has

some criterion for deciding when sufficient progress has been made that the

execution of a program can be halted. While, in general, I avoid tests which

require knowledge of the machine, preferring to use the criterion that no progress

has been made in an iteration, it is sometimes convenient or even necessary to

employ tests involving tolerances related to the structure of the computing device

at hand.

The most useful property of a system which can be determined systematically is

the machine precision. This is the smallest number, eps, such that

1+eps>1

† ADA is a registered name of the US Department of Defense.

(1.3)

6

Compact numerical methods for computers

within the arithmetic of the system. Two programs in FORTRAN for determining the

machine precision, the radix or base of the arithmetic, and machine rounding or

truncating properties have been given by Malcolm (1972). The reader is cautioned

that, since these programs make use of tests of conditions like (1.3), they may be

frustrated by optimising compilers which are able to note that (1.3) in exact

arithmetic is equivalent to

eps>0.

(1.4)

Condition (1.4) is not meaningful in the present context. The Univac compilers

have acquired some notoriety in this regard, but they are by no means the only

offenders.

To find the machine precision and radix by using arithmetic of the computer

itself, it is first necessary to find a number q such that (1 + q ) and q are

represented identically, that is, the representation of 1 having the same exponent

as q has a digit in the (t + 1)th radix position where t is the number of radix digits

in the floating-point mantissa. As an example, consider a four decimal digit

machine. If q = 10,000 or greater, then q is represented as (say)

0.1 * 1E5

while 1 is represented as

0·00001 * 1E5.

The action of storing the five-digit sum

0·10001 * 1E5

in a four-digit word causes the last digit to be dropped. In the example,

q = 10 000 is the smallest number which causes (1 + q) and q to be represented

identically, but any number

q > 9999

will have the same property. If the machine under consideration has radix R, then

any

q > Rt

(1.5)

t +1

will have the desired property. If, moreover, q and R are represented so that

then

q < Rt

+1

q+R>q.

(1.6)

(1.7)

In our example, R = 10 and t = 4 so the largest q consistent with (1.6) is

q = 105-10 = 99 990 = 0·9999 * 1E5

and

99 990 + 10 = 100 000 = 0·1000 * 1E6 > q.

Starting with a trial value, say q = 1, successive doubling will give some number

q = 2k

A starting point

7

such that (q + 1) and q are represented identically. By then setting r to successive

integers 2, 3, 4 , . . . , a value such that

q+r>q

(1.8)

will be found. On a machine which truncates, r is then the radix R. However, if

the machine rounds in some fashion, the condition (1.8) may be satisfied for r < R.

Nevertheless, the representations of q and (q + r) will differ by R. In the example,

doubling will produce q = 16 384 which will be represented as

0·1638 * 1E5

so q + r is represented as

0·1639 * 1E5

for some r 10. Then subtraction of these gives

0·0001 * 1E5 = 10.

Unfortunately, it is possible to foresee situations where this will not work.

Suppose that q = 99 990, then we have

0·9999 * 1E5 + 10 = 0·1000 * 1E6

and

0·1000 * 1E6–0·9999 * 1E5 = R'.

But if the second number in this subtraction is first transformed to

0·0999 * 1E6

then R? is assigned the value 100. Successive doubling should not, unless the

machine arithmetic is extremely unusual, give q this close to the upper bound of

(1.6).

Suppose that R has been found and that it is greater than two. Then if the

representation of q + (R – 1) is greater than that of q, the machine we are using

rounds, otherwise it chops or truncates the results of arithmetic operations.

The number of radix digits t is now easily found as the smallest integer such

that

Rt + 1

is represented identically to Rt. Thus the machine precision is given as

eps = R1-t = R-(t-1).

(1.9)

In the example, R = 10, t = 4, so

R-3 = 0·001.

Thus

1 + 0·00l = 1·001 > 1

but 1 + 0·0009 is, on a machine which truncates, represented as 1.

In all of the previous discussion concerning the computation of the machine

precision it is important that the representation of numbers be that in the

8

Compact numerical methods for computers

memory, not in the working registers where extra digits may be carried. On a

Hewlett-Packard 9830, for instance, it was necessary when determining the

so-called ‘split precision’ to store numbers specifically in array elements to force

the appropriate truncation.

The above discussion has assumed a model of floating-point arithmetic which may

be termed an additive form in that powers of the radix are added together and the

entire sum multiplied by some power of the radix (the exponent) to provide the final

quantity representing the desired real number. This representation may or may not

be exact. For example, the fraction cannot be exactly represented in additive binary

(radix 2) floating-point arithmetic. While there are other models of floating-point

arithmetic, the additive form is the most common, and is used in the IEEE binary

and radix-free floating-point arithmetic standards. (The March, 1981, issue of IEEE

Computer magazine, volume 3, number 4, pages 51-86 contains a lucid description of

the binary standard and its motivations.)

If we are concerned with having absolute upper and lower bounds on computed

quantities, interval arithmetic is possible, but not commonly supported by programming languages (e.g. Pascal SC (Kulisch 1987)). Despite the obvious importance of

assured bounds on results, the perceived costs of using interval arithmetic have

largely prevented its widespread use.

The development of standards for floating-point arithmetic has the great benefit

that results of similar calculations on different machinery should be the same.

Furthermore, manufacturers have been prompted to develop hardware implementations of these standards, notably the Intel 80 x 87 family and the Motorola 68881

of circuit devices. Hewlett-- Packard implemented a decimal version of the IEEE 858

standard in their HP 71B calculator.

Despite such developments, there continues to be much confusion and misinformation concerning floating-point arithmetic. Because an additive decimal form of

arithmetic can represent fractions such as exactly, and in general avoid inputoutput conversion errors, developers of software products using such arithmetic

(usually in binary coded decimal or BCD form) have been known to claim that it has

'no round-off error', which is patently false. I personally prefer decimal arithmetic, in

that data entered into a calculation can generally be represented exactly, so that a

display of the stored raw data reproduces the input familiar to the user. Nevertheless,

the differences between good implementations of floating-point arithmetic, whether

binary or decimal, are rarely substantive.

While the subject of machine arithmetic is still warm, note that the mean of two

numbers may be calculated to be smaller or greater than either! An example in

four-figure decimal arithmetic will serve as an illustration of this.

a

b

a+b

(a + b) /2

Exact

Rounded

Truncated

5008

5007

10015

5007·5

5008

5007

1002 * 10

501·0 * 10

= 5010

5008

5007

1001 * 10

500·5 * 10

= 500.5

A starting point

9

That this can and does occur should be kept in mind whenever averages are

computed. For instance, the calculations are quite stable if performed as

(a + b)/2 = 5000+[(a – 5000) + (b – 5000)]/2.

Taking account of every eventuality of this sort is nevertheless extremely tedious.

Another annoying characteristic of small machines is the frequent absence of

extended precision, also referred to as double precision, in which extra radix digits

are made available for calculations. This permits the user to carry out arithmetic

operations such as accumulation, especially of inner products of vectors, and

averaging with less likelihood of catastrophic errors. On equipment which functions with number representations similar to the IBM/360 systems, that is, six

hexadecimal (R = 16) digits in the mantissa of each number, many programmers

use the so-called ‘double precision’ routinely. Actually t = 14, which is not double

six. In most of the calculations that I have been asked to perform, I have not

found such a sledgehammer policy necessary, though the use of this feature in

appropriate situations is extremely valuable. The fact that it does not exist on

most small computers has therefore coloured much of the development which

follows.

Finally, since the manufacturers’ basic software has been put in question above,

the user may also wonder about their various application programs and packages.

While there are undoubtedly some ‘good’ programs, my own experience is that the

quality of such software is very mixed. Badly written and poorly documented

programs may take longer to learn and understand than a satisfactory homegrown

product takes to code and debug. A major fault with many software products is that

they lack references to the literature and documentation of their pedigree and

authorship. Discussion of performance and reliability tests may be restricted to very

simple examples. Without such information, it may be next to impossible to

determine the methods used or the level of understanding of the programmer of the

task to be carried out, so that the user is unable to judge the quality of the product.

Some developers of mathematical and statistical software are beginning to recognise

the importance of background documentation, and their efforts will hopefully be

rewarded with sales.

1.3. SOURCES OF PROGRAMS

When the first edition of this book was prepared, there were relatively few sources of

mathematical software in general, and in essence none (apart from a few manufacturers’ offerings) for users of small computers. This situation has changed remarkably, with some thousands of suppliers. Source codes of numerical methods,

however, are less widely available, yet many readers of this book may wish to

conduct a search for a suitable program already coded in the programming language

to be used before undertaking to use one of the algorithms given later.

How should such a search be conducted? I would proceed as follows.

First, if FORTRAN is the programming language, I would look to the major

collections of mathematical software in the Collected Algorithms of the Association for

Computing Machinery (ACM). This collection, abbreviated as CALGO, is comprised

10

Compact numerical methods for computers

of all the programs published in the Communications of the ACM (up to 1975) and

the ACM Transactions on Mathematical Software (since 1975). Other important

collections are EISPACK, UNPACK, FUNPACK and MINPACK, which concern algebraic

eigenproblems, linear equations, special functions and nonlinear least squares minimisation problems. These and other packages are, at time of writing, available from

the Mathematics and Computer Sciences Division of the Argonne National Laboratory of the US Department of Energy. For those users fortunate enough to have

access to academic and governmental electronic mail networks, an index of software

available can be obtained by sending the message

SEND INDEX

to the pseudo-user NETLIB at node ANL-MCS on the ARPA network (Dongarra and

Grosse 1987). The software itself may be obtained by a similar mechanism.

Suppliers such as the Numerical Algorithms Group (NAG), International Mathematical and Statistical Libraries (IMSL), C Abaci, and others, have packages

designed for users of various computers and compilers, but provide linkable object

code rather than the FORTRAN source. C Abaci, in particular, allows users of the

Scientific Desk to also operate the software within what is termed a ‘problem solving

environment’ which avoids the need for programming.

For languages other than FORTRAN, less software is available. Several collections of

programs and procedures have been published as books, some with accompanying

diskettes, but once again, the background and true authorship may be lacking. The

number of truly awful examples of badly chosen, badly coded algorithms is alarming,

and my own list of these too long to include here.

Several sources I consider worth looking at are the following.

Maindonald (1984)

—A fairly comprehensive collection of programs in BASIC (for a Digital Equipment Corporation VAX computer) are presented covering linear estimation,

statistical distributions and pseudo-random numbers.

Nash and Walker-Smith (1987)

—Source codes in BASIC are given for six nonlinear minimisation methods and a

large selection of examples. The algorithms correspond, by and large, to those

presented later in this book.

LEQBO5 (Nash 1984b, 1985)

—This single ‘program’ module (actually there are three starting points for

execution) was conceived as a joke to show how small a linear algebra package

could be made. In just over 300 lines of BASIC is the capability to solve linear

equations, linear least squares, matrix inverse and generalised inverse, symmetric matrix eigenproblem and nonlinear least squares problems. The joke

back-fired in that the capability of this program, which ran on the Sinclair ZX81

computer among other machines, is quite respectable.

Kahaner, Moler and Nash (1989)

—This numerical analysis textbook includes FORTRAN codes which illustrate the

material presented. The authors have taken pains to choose this software for

A starting point

11

quality. The user must, however, learn how to invoke the programs, as there is

no user interface to assist in problem specification and input.

Press et al (1986) Numerical Recipes

—This is an ambitious collection of methods with wide capability. Codes are

offered in FORTRAN, Pascal, and C. However, it appears to have been only

superficially tested and the examples presented are quite simple. It has been

heavily advertised.

Many other products exist and more are appearing every month. Finding out

about them requires effort, the waste of which can sometimes be avoided by using

modern online search tools. Sadly, more effort is required to determine the quality of

the software, often after money has been spent.

Finally on sources of software, readers should be aware of the Association for

Computing Machinery (ACM) Transactions on Mathematical Software which publishes research papers and reports algorithms. The algorithms themselves are available after a delay of approximately 1 year on NETLIB and are published in full in the

Collected Algorithms of the ACM. Unfortunately, many are now quite large programs, and the Transactions on Mathematical Software (TOMS) usually only

publishes a summary of the codes, which is insufficient to produce a working

program. Moreover, the programs are generally in FORTRAN.

Other journals which publish algorithms in some form or other are Applied

Statistics (Journal of the Royal Statistical Society, Part C), the Society for Industrial

and Applied Mathematics (SIAM) journals on Numerical Analysis and on Scientific

and Statistical Computing, the Computer Journal (of the British Computer Society),

as well as some of the specialist journals in computational statistics, physics,

chemistry and engineering. Occasionally magazines, such as Byte or PC Magazine,

include articles with interesting programs for scientific or mathematical problems.

These may be of very variable quality depending on the authorship, but some

exceptionally good material has appeared in magazines, which sometimes offer the

codes in machine-readable form, such as the Byte Information Exchange (BIX) and

disk ordering service. The reader has, however, to be assiduous in verifying the

quality of the programs.

1.4. PROGRAMMING LANGUAGES USED AND STRUCTURED

PROGRAMMING

The algorithms presented in this book are designed to be coded quickly and easily for

operation on a diverse collection of possible target machines in a variety of

programming languages. Originally, in preparing the first edition of the book, I

considered presenting programs in BASIC, but found at the time that the various

dialects of this language were not uniform in syntax. Since then, International

Standard Minimal BASIC (IS0 6373/ 1984) has been published, and most commonly

available BASICS will run Minimal BASIC without difficulty. The obstacle for the user is

that Minimal BASIC is too limited for most serious computing tasks, in that it lacks

string and file handling capabilities. Nevertheless, it is capable of demonstrating all

the algorithms in this book.

12

Compact numerical methods for computers

As this revision is being developed, efforts are ongoing to agree an international

standard for Full BASIC. Sadly, in my opinion, these efforts do not reflect the majority

of existing commercial and scientific applications. which are coded in a dialect of

BASIC compatible with language processors from Microsoft Corporation or Borland

International (Turbo BASIC).

Many programmers and users do not wish to use BASIC, however, for reasons quite

apart from capability. They may prefer FORTRAN, APL, C, Pascal, or some other

programming language. On certain machinery, users may be forced to use the

programming facilities provided. In the 1970s, most Hewlett-Packard desktop

computers used exotic programming languages of their own, and this has continued

to the present on programmable calculators such as the HP 15C. Computers offering

parallel computation usually employ programming languages with special extensions

to allow the extra functionality to be exploited.

As an author trying to serve this fragmented market, I have therefore wanted to

keep to my policy of presenting the algorithms in step-and-description form.

However, implementations of the algorithms allow their testing, and direct publication of a working code limits the possibilities for typographical errors. Therefore,

in this edition, the step-and-description codes have been replaced by Turbo Pascal

implementations. A coupon for the diskette of the codes is included. Turbo Pascal

has a few disadvantages, notably some differences from International Standard

Pascal, but one of its merits (others are discussed in $1.6) is that it allows the

algorithms to be presented in a manner which is readable and structured.

In recent years the concepts of structured and modular programming have

become very popular, to the extent that one programming language (Modula-2) is

founded on such principles. The interested reader is referred to Kernighan and

Plauger (1974) or Yourdon (1975) for background, and to Riley (1988) for a more

modern exposition of these ideas. In my own work, I have found such concepts

extremely useful, and I recommend them to any practitioner who wishes to keep his

debugging and reprogramming efforts to a minimum. Nevertheless, while modularity is relatively easy to impose at the level of individual tasks such as the decomposition of a matrix or the finding of the minimum of a function along a line, it is not

always reasonable to insist that the program avoid GOTO instructions. After all, in

aimimg to keep memory requirements as low as possible, any program code which

can do double duty is desirable. If possible, this should be collected into a

subprogram. In a number of cases this will not be feasible, since the code may have to

be entered at several points. Here the programmer has to make a judgement between

compactness and readability of his program. I have opted for the former goal when

such a decision has been necessary and have depended on comments and the essential

shortness of the code to prevent it from becoming incomprehensible.

The coding of the algorithms in the book is not as compact as it might be in a

specific application. In order to maintain a certain generality, I have chosen to allow

variables and parameters to be passed to procedures and functions from fairly

general driver programs. If algorithms are to be placed in-line in applications, it is

possible to remove some of this program ‘glue’. Furthermore, some features may not

always be necessary, for example, computation of eigenvectors in the Jacobi method

for eigensolutions of a real symmetric matrix (algorithm 14).

A starting point

13

It should also be noted that I have taken pains to make it easy to save a ‘copy’ of

the screen output to a file by duplicating all the output statements, that is the ‘write’

and ‘writeln’ commands, so that output is copied to a file which the user may name.

(These statements are on the disk files, but deleted from the listings to reduce space

and improve readability.) Input is allowed from an input file to allow examples to be

presented without the user needing to type the appropriate response other than the

name of the relevant ‘example’ file.

Furthermore, I have taken advantage of features within the MS-DOS operating

system, and supported by compiler directives in Turbo Pascal, which allow for

pipelining of input and output. This has allowed me to use batch files to automate the

running of tests.

In the driver programs I have tried to include tests of the results of calculations, for

example, the residuals in eigenvalue computations. In practice, I believe it is

worthwhile to perform these calculations. When memory is at a premium, they can

be performed ‘off-line’ in most cases. That is. the results can be saved to disk

(backing storage) and the tests computed as a separate task, with data brought in

from the disk only as needed.

These extra features use many extra bytes of code, but are, of course, easily

deleted. Indeed, for specific problems, 75% or more of the code can be removed.

1.5. CHOICE OF ALGORITHMS

The algorithms in this book have been chosen for their utility in solving a variety of

important problems in computational mathematics and their ease of implementation

to short programs using relatively little working storage. Many topics are left out,

despite their importance, because I feel that there has been insufficient development in

directions relevant to compact numerical methods to allow for a suitable algorithm

to be included. For example, over the last 15 years I have been interested in methods

for the mathematical programming problem which do not require a tableau to be

developed either explicitly or implicitly, since these techniques are generally quite

memory and code intensive. The appearance of the interior point methods associated

with the name of Karmarkar (1984) hold some hope for the future, but currently the

programs are quite complicated.

In the solution of linear equations, my exposition has been confined to Gauss

elimination and the Choleski decomposition. The literature on this subject is,

however, vast and other algorithms exist. These can and should be used if special

circumstances arise to make them more appropriate. For instance, Zambardino

(1974) presents a form of Gauss elimination which uses less space than the one

presented here. This procedure, in ALGOL, is called QUARTERSOLVE because only

n 2/4 elements are stored, though an integer vector is needed to store pivot

information and the program as given by Zambardino is quite complicated.

Many special methods can also be devised for matrices having special structures

such as diagonal bands. Wilkinson and Reinsch (1971) give several such algorithms for both linear equations and the eigenvalue problem. The programmer

with many problems of a special structure should consider these. However, I have

found that most users want a reliable general-purpose method for linear equations

14

Compact numerical methods for computers

because their day-to-day problems vary a great deal. I have deliberately avoided

including a great many algorithms in this volume because most users will likely be

their own implementors and not have a great deal of time to spend choosing,

coding, testing and, most of all, maintaining programs.

Another choice which has been made is that of only including algorithms which

are relatively ‘small’ in the sense that they fit into the machine all at once. For

instance, in the solution of the algebraic eigenvalue problem on large computers,

conventionally one reduces the matrix to a special form such as a tridiagonal or a

Hessenberg matrix, solves the eigenproblem of the simpler system then backtransforms the solutions. Each of the three phases of this procedure could be

fitted into a small machine. Again, for the practitioner with a lot of matrices to

solve or a special requirement for only partial solution, such methods should be

employed. For the one-at-a-time users, however, there is three times as much

program code to worry about.

The lack of double-precision arithmetic on the machines I used to develop the

algorithms which are included has no doubt modified my opinions concerning

algorithms. Certainly, any algorithm requiring inner products of vectors, that is

(1.10)

cannot be executed as accurately without extended-precision arithmetic (Wilkinson 1963). This has led to my abandonment of algorithms which seek to find the

minimum of a function along a line by use of gradient information. Such

algorithms require the derivative along the line and employ an inner product to

compute this derivative. While these methods are distinctly unreliable on a

machine having only a single, low-precision arithmetic, they can no doubt be used

very effectively on other machines.

From the above discussion it will be evident that the principles guiding

algorithm selection have been:

(i) shortness of program which results from implementation and low storage

requirement, and

(ii) general utility of the method and importance of the problem which it solves.

To these points should be added:

(iii) proven reliability in a number of tests

(iv) the ease and speed with which a user can obtain useful results from the

algorithms.

The third point is very important. No program should be considered acceptable until

it has been tested fairly extensively. If possible, any method which gives solutions

that can be checked by computing diagnostics should compute such information

routinely. For instance, I have had users of my eigenvalue/eigenvector programs call

me to say, ‘Your program doesn’t work!’ In all cases to date they have been

premature in their judgement, since the residuals computed as a routine adjunct to

the eigensolution formation have shown the output to be reasonable even though it

might be very different from what the user expected. Furthermore, I have saved

A starting point

15

myself the effort of having to duplicate their calculation to prove the correctness of

the results. Therefore, if at all possible, such checks are always built into my

programs.

The fourth point is important if users are to be able to try out the ideas presented

in this book. As a user myself, I do not wish to spend many hours mastering the

details of a code. The programs are to be treated as tools, not an end in themselves.

These principles lead to the choice of the Givens’ reduction in algorithm 4 as a

method for the solution of least-squares problems where the amount of data is too

great to allow all of it to be stored in the memory at once. Similarly, algorithms 24

and 25 require the user to provide a rule for the calculation of the product of a

matrix and a vector as a step in the solution of linear equations or the algebraic

eigenproblem. However, the matrix itself need not be stored explicitly. This

avoids the problem of designing a special method to take advantage of one type of

matrix, though it requires rather more initiative from the user as it preserves this

measure of generality.

In designing the particular forms of the algorithms which appear, a conscious

effort has been made to avoid the requirement for many tolerances. Some

machine-dependent quantities are unfortunately needed (they can in some cases

be calculated by the program but this does lengthen the code), but as far as

possible, and especially in determining when iterative algorithms have converged,

devices are used which attempt to ensure that as many digits are determined as

the machine is able to store. This may lead to programs continuing to execute long

after acceptable answers have been obtained. However, I prefer to sin on the side

of excess rather than leave results wanting in digits. Typically, the convergence

test requires that the last and present iterates be identical to the precision of the

machine by means of a test such as

if x + delta + offset = x + offset then halt;

where offset is some modest number such as 10. On machines which have an

accumulator with extra digits, this type of test may never be satisfied, and must be

replaced by

y: = x + delta + offset;

z: = x + offset;

if y = z then halt;

The ‘tolerance’ in this form of test is provided by the offset: within the computer the

representations of y and z must be equal to halt execution. The simplicity of this type

of test usually saves code though, as mentioned, at the possible expense of execution

time.

1.6. A METHOD FOR EXPRESSING ALGORITHMS

In the first edition of this work, algorithms were expressed in step-and-description

form. This allowed users to program them in a variety of programming languages.

Indeed, one of the strengths of the first edition was the variety of implementations.

At the time it appeared, a number of computers had their own languages or dialects,

16

Compact numerical methods for computer

and specialisation to one programming language would have inhibited users of these

special machines. Now, however, computer users are unlikely to be willing to type in

code if a machine-readable form of an algorithm exists. Even if the programming

language must be translated. having a workable form is useful as a starting point.

The original codes for the first edition were in BASIC for a Data General NOVA.

Later these codes were made to run on a North Star Horizon. Some were made to

work on a Hewlett-Packard 9830A. Present BASIC versions run on various common

microcomputers under the Microsoft BASIC dialect; however, since I have used very

conservative coding practices, apart from occasional syntactic deviations, they

conform to IS0 Minimal BASIC (IS0 6373-1984).

Rather than proof-reading the algorithms for the first edition, I re-coded them in

FORTRAN. These codes exist as NASHLIB, and were and are commercially available

from the author. I have not, however, been particularly satisfied that the FORTRAN

implementation shows the methods to advantage, since the structure of the algorithms seems to get lost in the detail of FORTRAN code. Also, the working parts of the

codes are overshadowed by the variable definitions and subroutine calls. Compact

methods are often best placed in-line rather than called as standard subprograms as I

have already indicated.

In the current edition, I want to stay close to the original step-and-description

form of the algorithms, but nevertheless wish to offer working codes which could be

distributed in machine-readable form. I have chosen to present the algorithms in

Borland Turbo Pascal. This has the following justification.

(i) Pascal allows comments to be placed anywhere in the code, so that the

original style for the algorithms, except for the typographic conventions, could be

kept.

(ii) Turbo Pascal is available for many machines and is relatively inexpensive. It

is used as a teaching tool at many universities and colleges, including the University

of Ottawa. Version 3.01a of the Turbo Pascal system was used in developing the

codes which appear here. I intend to prepare versions of the codes to run under later

versions of this programming environment.

(iii) The differences between Turbo and Standard Pascal are unlikely to be

important for the methods, so that users of other Pascal compilers can also use these

codes.

(iv) Pascal is ‘close enough’ to many other programming languages to allow for

straightforward translation of the codes.

A particular disadvantage of Turbo Pascal for my development work is that I have

yet to find a convenient mechanism allowing automatic compilation and execution of

codes, which would permit me to check a complete set of code via batch execution.

From the perspective of the physical length of the listings, the present algorithms are

also longer than I would like because Pascal requires program headings and

declarations. In the procedural parts of the codes, ‘begin’ and ‘end’ statements also

add to the line count to some extent.

From the user perspective, the requirement that matrix sizes be explicitly specified

can be a nuisance. For problems with varying matrix sizes it may be necessary to

compile separate versions of programs.

A starting point

17

Section 1.8 notes some other details of algorithm expression which relate to the

ease of use of the codes.

1.7. GENERAL NOTATION

I have not attempted to avoid re-use of symbols within this work since this would

have required an over-large set of symbols. In fact, I have used greek letters as

little as possible to save my typists’ and typesetters’ effort. However, within

chapters and within a subject area the symbols should be consistent. There follow

some brief general points on notation.

(i) Absolute value is denoted by vertical bars about a quantity, | |.

(ii) The norm of a quantity is denoted by double vertical bars, || ||. The form of

this must be found, where necessary, from the context.

(iii) A closed interval [u, v] comprises all points x such that u < x < v. An open

interval (u, v) comprises all points x such that u < x < v.

(iv) The exponents of decimal numbers will be expressed using the symbol E as in

1·234 * 10-5 = 1·234E-5

and

6·78 * 102 = 678 = 6·78E2.

This notation has already appeared in §1.2.

1.8. SOFTWARE ENGINEERING ISSUES

The development of microcomputer software for users who are not trained in

computer science or related subjects has given rise to user interfaces which are much

more sophisticated and easy to use than were common when the first edition

appeared. Mathematical software has not escaped such developments, but source

code collections have generally required the user to consolidate program units, add

driver and user routines, and compile and run the programs. In my experience, the

lack of convenience implied by this requirement is a major obstacle to users learning

about software published in source code form. In our nonlinear estimation software

(Nash and Walker-Smith 1987), we were careful to include batch command files to

allow for easy consolidation and execution of programs. This philosophy is continued here, albeit adapted to Turbo Pascal.

1. All driver programs include code (from the fragment in file startup.pas) to

allow the user to specify a file from which the program control input and the

problem data are to be input. We refer to this as a ‘control input file’. It has a

name stored in the global string variable infname, and is referred to by the

global text variable infile. Algorithms which need input get it by read or readln

statements from infile. The input file can be ‘con’, the console keyboard.

WARNING: errors in input control files may cause source code files to be destroyed. I

believe this is a ‘bug’ in Turbo Pascal 3.01a, the version used to develop the codes.

18

Compact numerical methods for computers

The use of an include file which is not a complete procedure or function is not

permitted by Turbo Pascal 5.0.

2. The same program code (startup.pas) allows an output file to be specified so

that all output which appears on the console screen is copied to a file. The name

for this file is stored in the global variable confname, and the file is referred to in

programs by the global text variable confile. Output is saved by the crude but

effective means of duplicating every write(. . .) and writeln(. . .) statement with

equivalent write(confile, . . .) and writeln(confile, . . .) statements.

3. To make the algorithms less cluttered, these write and writeln statements to

confile do not appear in the listings. They are present in the files on diskette.

4. To discourage unauthorised copying of the diskette files, all commentary and

documentation of the algorithm codes has been deleted.

5. To allow for execution of the programs from operating system commands (at

least in MS-DOS), compiler directives have been included at the start of all

driver programs. Thus, if a compiled form of code dr0102.pas exists as

dr0102.com, and a file dr0102x contains text equivalent to the keyboard input

needed to correctly run this program, the command

dr0102 < dr0102x

will execute the program for the given data.

Previous Home Next

Chapter 2

FORMAL PROBLEMS IN LINEAR ALGEBRA

2.1. INTRODUCTION

A great many practical problems in the scientific and engineering world give rise

to models or descriptions of reality which involve matrices. In consequence, a very

large proportion of the literature of numerical mathematics is devoted to the

solution of various matrix equations. In the following sections, the major formal

problems in numerical linear algebra will be introduced. Some examples are

included to show how these problems may arise directly in practice. However, the

formal problems will in most cases occur as steps in larger, more difficult

computations. In fact, the algorithms of numerical linear algebra are the keystones of numerical methods for solving real problems.

Matrix computations have become a large area for mathematical and computational research. Textbooks on this subject, such as Stewart (1973) and Strang

(1976), offer a foundation useful for understanding the uses and manipulations of

matrices and vectors. More advanced works detail the theorems and algorithms for

particular situations. An important collection of well-referenced material is Golub

and Van Loan (1983). Kahaner, Moler and Nash (1989) contains a very readable

treatment of numerical linear algebra.

2.2. SIMULTANEOUS LINEAR EQUATIONS

If there are n known relationships

Ailx1 + Ai2x2 +. . .+ Ainxn = bi

i = 1, 2, . . . , n

(2.1)

between the n quantities xj with the coefficients Aij and right-hand side elements

bi, i = 1, 2, . . . , n, then (2.1) is a set of n simultaneous linear equations in n

unknowns xj, j = 1, 2, . . . , n. It is simpler to write this problem in matrix form

Ax = b

(2.2)

where the coefficients have been collected into the matrix A, the right-hand side is

now the vector b and the unknowns have been collected as the vector x. A further

way to write the problem is to collect each column of A (say the jth) into a column

vector (i.e. aj). Then we obtain

(2.3)

Numerous textbooks on linear algebra, for instance Mostow and Sampson

(1969) or Finkbeiner (1966), will provide suitable reading for anyone wishing to

19

20

Compact numerical methods for computers

learn theorems and proofs concerning the existence of solutions to this problem.

For the purposes of this monograph, it will suffice to outline a few basic properties

of matrices as and when required.

Consider a set of n vectors of length n, that is

a1, a2, . . . , an.

(2.4)

These vectors are linearly independent if there exists no set of parameters

xj, j = 1, 2, . . . , n (not all zero), such that

(2.5)

where 0 is the null vector having all components zero. If the vectors aj are now

assembled to make the matrix A and are linearly independent, then it is always

possible to find an x such that (2.2) is satisfied. Other ways of stating the

condition that the columns of A are linearly independent are that A has full rank

or

rank(A) = n

(2.6)

or that A is non-singular,

If only k < n of the vectors are linearly independent, then

rank(A) = k

(2.7)

and A is singular. In general (2.2) cannot be solved if A is singular, though

consistent systems of equations exist where b belongs to the space spanned by

{aj: j = 1, 2, . . . , n}.

In practice, it is useful to separate linear-equation problems into two categories.

(The same classification will, in fact, apply to all problems involving matrices.)

(i) The matrix A is of modest order with probably few zero elements (dense).

(ii) The matrix A is of high order and has most of its elements zero (sparse).

The methods presented in this monograph for large matrices do not specifically

require sparsity. The question which must be answered when computing on a small

machine is, ‘Does the matrix fit in the memory available?’

Example 2.1. Mass - spectrograph calibration

To illustrate a use for the solution of a system of linear equations, consider the

determination of the composition of a mixture of four hydrocarbons using a mass

spectrograph. Four lines will be needed in the spectrum. At these lines the

intensity for the sample will be bi, i = 1, 2, 3, 4. To calibrate the instrument,

intensities Aij for the ith line using a pure sample of the j th hydrocarbon are

measured. Assuming additive line intensities, the composition of the mixture is

then given by the solution x of

Ax = b.

Example 2.2. Ordinary differential equations: a two-point boundary-value problem

Large sparse sets of linear equations arise in the numerical solution of differential

Formal problems in linear algebra

21

equations. Fr?berg (1965, p 256) considers the differential equation

y" + y/(1+x 2 ) = 7x

(2.8)

with the boundary conditions

y= 0

2

{

for x = 0

for x = 1.

(2.9)

(2.10)

To solve this problem numerically, Fr?berg replaces the continuum in x on the

interval [0, 1] with a set of (n + 1) points, that is, the step size on the grid is

h = 1/n. The second derivative is therefore replaced by the second difference at

point j

(2.11)

(yj+l – 2yj + yj-1)/h 2 .

The differential equation (2.8) is therefore approximated by a set of linear

equations of which the jth is

(2.12)

or

(2.13)

Because y0 = 0 and yn = 2, this set of simultaneous linear equations is of order

(n - 1). However, each row involves at most three of the values yj. Thus, if the

order of the set of equations is large, the matrix of coefficients is sparse.

2.3. THE LINEAR LEAST-SQUARES PROBLEM

As described above, n linear equations give relationships which permit n parameters to be determined if the equations give rise to linearly independent coefficient

vectors. If there are more than n conditions, say m, then all of them may not

necessarily be satisfied at once by any set of parameters x. By asking a somewhat

different question, however, it is possible to determine solutions x which in some

way approximately satisfy the conditions. That is, we wish to write

Ax b

(2.14)

where the sense of the sign is yet to be defined.

By defining the residual vector

r = b – Ax

(2.15)

we can express the lack of approximation for a given x by the norm of r

|| r ||.

This must fulfil the following conditions:

for r

(2.16)

|| r || > 0

(2.17)

|| cr || = || c || · || r ||

(2.18)

0, and || 0 || = 0,

22

Compact numerical methods for computers

for an arbitrary complex number c, and

(2.19)

|| r + s || < || r || + || s ||

where s is a vector of the same order as r (that is, m).

Condition (2.19) is called the triangle inequality since the lengths of the sides of

a triangle satisfy this relationship. While there exist many norms, only a few are of

widespread utility, and by and large in this work only the Euclidean norm

|| r ||E = (r T r) ?

(2.20)

will be used. The superscript T denotes transposition, so the norm is a scalar. The

square of the Euclidean norm of r

(2.21)

is appropriately called the sum of squares. The least-squares solution x of (2.14) is

that set of parameters which minimises this sum of squares. In cases where

rank(A) < n this solution is not unique. However, further conditions may be

imposed upon the solution to ensure uniqueness. For instance. it may be required

that in the non-unique case, x shall be that member of the set of vectors which

minimises rT r which has x T x a minimum also. In this case x is the unique

minimum-length least-squares solution.

If the minimisation of r T r with respect to x is attempted directly, then using

(2.15) and elementary calculus gives

AT Ax = A T b

(2.22)

as the set of conditions which x must satisfy. These are simply n simultaneous

linear equations in n unknowns x and are called the normal equations. Solution of

the least-squares problem via the normal equations is the most common method

by which such problems are solved. Unfortunately, there are several objections to

such an approach if it is not carefully executed, since the special structure of ATA

and the numerical instabilities which attend its formation are ignored at the peril

of meaningless computed values for the parameters x.

Firstly, any matrix B such that

x T Bx > 0

for all x

(2.23)

0 is called positive definite. If

x T Bx > 0

(2.24)

for all x, B is non-negative definite or positive semidefinite. The last two terms are

synonymous. The matrix AT A gives the quadratic form

Q = xTAT Ax

(2.25)

for any vector x of order n. By setting

y = Ax

Q = yT y > 0

(2.26)

(2.27)

Formal problems in linear algebra

23

T

so that A A is non-negative definite. In fact, if the columns of A are linearly

independent, it is not possible for y to equal the order-m null vector 0, so that in

this case AT A is positive definite. This is also called the full-rank case.

Secondly, many computer programs for solving the linear least-squares problem

ignore the existence of special algorithms for the solution of linear equations

having a symmetric, positive definite coefficient matrix. Above it has already been

established that AT A is positive definite and symmetry is proved trivially. The

special algorithms have advantages in efficiency and reliability over the methods

for the general linear-equation problem.

Thirdly, in chapter 5 it will be shown that the formation of AT A can lead to loss

of information. Techniques exist for the solution of the least-squares problem

without recourse to the normal equations. When there is any question as to the

true linear independence of the columns of A, these have the advantage that they

permit the minimum-length least-squares solution to be computed.

It is worth noting that the linear-equation problem of equation (2.2) can be

solved by treating it as a least-squares problem. Then for singular matrices A

there is still a least-squares solution x which, if the system of equations is

consistent, has a zero sum of squares r T r. For small-computer users who do not

regularly need solutions to linear equations or whose equations have coefficient

matrices which are near-singular (ill conditioned is another way to say this), it is

my opinion that a least-squares solution method which avoids the formation of

AT A is useful as a general approach to the problems in both equations (2.2) and

(2.14).

As for linear equations, linear least-squares problems are categorised by

whether or not they can be stored in the main memory of the computing device at

hand. Once again, the traditional terms dense and sparse will be used, though

some problems having m large and n reasonably small will have very few zero

entries in the matrix A.

Example 2.3. Least squares

It is believed that in the United States there exists a linear relationship between

farm money income and the agricultural use of nitrogen, phosphate, potash and

petroleum. A model is therefore formulated using, for simplicity, a linear form

(money income) = x1 + x2 (nitrogen) + x3 (phosphate) + x4 (potash) + x5 (petroleum).

(2.28)

For this problem the data are supplied as index numbers (1940 = 100) to avoid

difficulties associated with the units in which the variables are measured. By

collecting the values for the dependent variable (money income) as a vector b and

the values for the other variables as the columns of a matrix A including the

constant unity which multiplies x1, a problem

Ax b

(2.14)

is obtained. The data and solutions for this problem are given as table 3.1 and

example 3.2.

24

Compact numerical methods for computers

Example 2.4. Surveying-data fitting

Consider the measurement of height differences by levelling (reading heights off a

vertical pole using a levelled telescope). This enables the difference between the

heights of two points i and j to be measured as

bij, = hi– hj + rij

(2.29)

where rij is the error made in taking the measurement. If m height differences are

measured, the best set of heights h is obtained as the solution to the least-squares

problem

minimise rT r

(2.30)

where

r = b – Ah

and each row of A has only two non-zero elements, 1 and -1, corresponding to

the indices of the two points involved in the height-difference measurement. Sometimes the error is defined as the weighted residual

rij = [bij – (hi – hj)]d ij

where dij is the horizontal distance between the two points (that is, the measurement error increases with distance).

A special feature of this particular problem is that the solution is only

determined to within a constant, h0, because no origin for the height scale has

been specified. In many instances, only relative heights are important, as in a

study of subsidence of land. Nevertheless, the matrix A is rank-deficient, so any

method chosen to solve the problem as it has been presented should be capable of

finding a least-squares solution despite the singularity (see example 19.2).

2.4. THE INVERSE AND GENERALISED INVERSE OF A MATRIX

An important concept is that of the inverse of a square matrix A. It is defined as

that square matrix, labelled A-1, such that

A-1A = AA-1 = 1 n

(2.31)

where 1n is the unit matrix of order n. The inverse exists only if A has full rank.

Algorithms exist which compute the inverse of a matrix explicitly, but these are of

value only if the matrix inverse itself is useful. These algorithms should not be

used, for instance, to solve equation (2.2) by means of the formal expression

x = A- 1 b

(2.32)

since this is inefficient. Furthermore, the inverse of a matrix A can be computed

by setting the right-hand side b in equation (2.2) to the n successive columns of

the unit matrix 1 n. Nonetheless, for positive definite symmetric matrices, this

monograph presents a very compact algorithm for the inverse in §8.2.

When A is rectangular, the concept of an inverse must be generalised. Corresponding to (2.32) consider solving equation (2.14) by means of a matrix A+, yet to

be defined, such that

(2.33)

x = A+ b.

Formal problems in linear algebra

25

In other words, we have

A + A = 1n .

(2.34)

When A has only k linearly independent columns, it will be satisfactory if

A+ A =

(2.35)

but in this case x is not defined uniquely since it can contain arbitrary components

from the orthogonal complement of the space spanned by the columns of A. That

is, we have

(2.36)

x = A+ b + (1 n – A + A) g

where g is any vector of order n.

The normal equations (2.22) must still be satisfied. Thus in the full-rank case, it

is straightforward to identify

A+ = (AT A) -lA T .

(2.37)

In the rank-deficient case, the normal equations (2.22) imply by substitution of

(2.36) that

(2.38)

AT A x = AT AA+ b+(AT A – AT AA+ A) g

= AT b .

If

AT AA+ = AT

(2.39)

then equation (2.38) is obviously true. By requiring A+ to satisfy

AA + A = A

(2.40)

(AA+)T = AA+

(2.41)

and

this can indeed be made to happen. The proposed solution (2.36) is therefore a

least-squares solution under the conditions (2.40) and (2.41) on A+. In order that

(2.36) gives the minimum-length least-squares solution, it is necessary that xT x be

minimal also. But from equation (2.36) we find

x T x = b T (A+ ) T A+ b + g T (1 – A+A) T ( 1 – A+A)g + 2g T(1 – A+ A) TA+ b (2.42)

which can be seen to have a minimum at

g =0

if

(1 – A + A) T

(2.43)

26

Compact numerical methods for computers

is the annihilator of A+ b, thus ensuring that the two contributions (that is, from b

and g) to x T x are orthogonal. This requirement imposes on A + the further

conditions

A+ AA + = A+

(2.44)

(A+ A)T = A+ A.

(2.45)

The four conditions (2.40), (2.41), (2.44) and (2.45) were proposed by Penrose

(1955). The conditions are not, however, the route by which A + is computed.

Here attention has been focused on one generalised inverse, called the MoorePenrose inverse. It is possible to relax some of the four conditions and arrive at

other types of generalised inverse. However, these will require other conditions to

be applied if they are to be specified uniquely. For instance, it is possible to

consider any matrix which satisfies (2.40) and (2.41) as a generalised inverse of A

since it provides, via (2.33), a least-squares solution to equation (2.14). However,

in the rank-deficient case, (2.36) allows arbitrary components from the null space

of A to be added to this least-squares solution, so that the two-condition generalised inverse is specified incompletely.

Over the years a number of methods have been suggested to calculate ‘generalised

inverses’. Having encountered some examples of dubious design, coding or applications of such methods, I strongly recommend testing computed generalised inverse

matrices to ascertain the extent to which conditions (2.40), (2.41), (2.44) and (2.45)

are satisfied (Nash and Wang 1986).

2.5. DECOMPOSITIONS OF A MATRIX

In order to carry out computations with matrices, it is common to decompose

them in some way to simplify and speed up the calculations. For a real m by n

matrix A, the QR decomposition is particularly useful. This equates the matrix A

with the product of an orthogonal matrix Q and a right- or upper-triangular

matrix R, that is

A = QR

(2.46)

where Q is m by m and

QT Q = QQT = 1 m

(2.47)

and R is m by n with all elements

Rij = 0

for i > j.

(2.48)

The QR decomposition leads to the singular-value decomposition of the matrix A

if the matrix R is identified with the product of a diagonal matrix S and an orthogonal matrix VT, that is

R = SVT

(2.49)

where the m by n matrix S is such that

j

(2.50)

V T V = VVT = 1 n .

(2.5 1)

Sij = 0

for i

and V, n by n, is such that

Formal problems in linear algebra

27

Note that the zeros below the diagonal in both R and S imply that, apart from

orthogonality conditions imposed by (2.47), the elements of columns (n + 1),

(n + 2), . . . , m of Q are arbitrary. In fact, they are not needed in most calculations, so will be dropped, leaving the m by n matrix U, where

UT U = 1n .

(2.52)

T

Note that it is no longer possible to make any statement regarding UU . Omitting

rows (n + 1) to m of both R and S allows the decompositions to be written as

A = UR = USVT

(2.53)

where A is m by n, U is m by n and U T U = 1n, R is n by n and upper-triangular, S

is n by n and diagonal, and V is n by n and orthogonal. In the singular-value

decomposition the diagonal elements of S are chosen to be non-negative.

Both the QR and singular-value decompositions can also be applied to square

matrices. In addition, an n by n matrix A can be decomposed into a product of

a lower- and an upper-triangular matrix, thus

A = LR.

(2.54)

In the literature this is also known as the LU decomposition from the use of ‘U’ for

‘upper triangular’. Here another mnemonic, ‘U’ for ‘unitary’ has been employed.

If the matrix A is symmetric and positive definite, the decomposition

A = LLT

(2.55)

is possible and is referred to as the Choleski decomposition.

A scaled form of this decomposition with unit diagonal elements for L can be written

A = LDL T

where D is a diagonal matrix.

To underline the importance of decompositions, it can be shown by direct

substitution that if

A = USVT

(2.53)

then the matrix

(2.56)

A+ = VS+ U T

where

for Sii 0

S = 1/S ii

(2.57)

0

for Sii = 0

{

satisfies the four conditions (2.40), (2.41), (2.44) and (2.45), that is

AA + A = USVT VS+ UT USVT

= USS+ SVT

= USVT = A

(2.58)

(2.59)

28

Compact numerical methods for computers

A+ AA+ = VS+ UT USVT VS+ UT

= VS+ SS+ UT = VS+ U T = A+

(2.60)

and

T T

(A+ A)T = (VS + UT USVT)T = ( VS+ SV )

= VS + SVT = A+ A.

Several of the above relationships depend on the diagonal nature of S and S

on the fact that diagonal matrices commute under multiplication.

(2.61)

+

and

2.6. THE MATRIX EIGENVALUE PROBLEM

An eigenvalue e and eigenvector x of an n by n matrix A, real or complex, are

respectively a scalar and vector which together satisfy the equation

Ax = ex.

(2.62)

There will be up to n eigensolutions (e, x) for any matrix (Wilkinson 1965) and

finding them for various types of matrices has given rise to a rich literature. In

many cases, solutions to the generalised eigenproblem

Ax = eBx

(2.63)

are wanted, where B is another n by n matrix. For matrices which are of a size

that the computer can accommodate, it is usual to transform (2.63) into type

(2.62) if this is possible. For large matrices, an attempt is usually made to solve

(2.63) itself for one or more eigensolutions. In all the cases where the author has

encountered equation (2.63) with large matrices, A and B have fortunately been

symmetric, which provides several convenient simplifications, both theoretical and

computational.

Example 2.5. Illustration of the matrix eigenvalue problem

In quantum mechanics, the use of the variation method to determine approximate

energy states of physical systems gives rise to matrix eigenvalue problems if the

trial functions used are linear combinations of some basis functions (see, for

instance, Pauling and Wilson 1935, p 180ff).

If the trial function is F, and the energy of the physical system in question is

described by the Hamiltonian operator H, then the variation principle seeks

stationary values of the energy functional

(F, HF)

C = (E, F)

(2.64)

subject to the normalisation condition

(F, F) = 1

(2.65)

where the symbol ( , ) represents an inner product between the elements

separated by the comma within the parentheses. This is usually an integral over all

Formal problems in linear algebra

29

the dimensions of the system. If a linear combination of some functions fi,

j = 1, 2, . . . ) n, is used for F, that is

(2.66)

then the variation method gives rise to the eigenvalue problem

Ax = eBx

(2.63)

Aij = (fj, Hfi)

(2.67)

Bij = (fi, fj)

(2.68)

with

and

It is obvious that if B is a unit matrix, that is, if

1

for i = j

(fi, fi) = ? ij = 0

for i j

{

(2.69)

a problem of type (2.56) arises. A specific example of such a problem is equation

(11.1).

Previous

Chapter 3

THE SINGULAR-VALUE DECOMPOSITION AND

ITS USE TO SOLVE LEAST-SQUARES PROBLEMS

3.1. INTRODUCTION

This chapter presents an algorithm for accomplishing the powerful and versatile

singular-value decomposition. This allows the solution of a number of problems to

be realised in a way which permits instabilities to be identified at the same time.

This is a general strategy I like to incorporate into my programs as much as

possible since I find succinct diagnostic information invaluable when users raise

questions about computed answers-users do not in general raise too many idle

questions! They may, however, expect the computer and my programs to produce

reliable results from very suspect data, and the information these programs

generate together with a solution can often give an idea of how trustworthy are

the results. This is why the singular values are useful. In particular, the appearance of singular values differing greatly in magnitude implies that our data are

nearly collinear. Collinearity introduces numerical problems simply because small

changes in the data give large changes in the results. For example, consider the

following two-dimensional vectors:

A = (1, 0)T

B = (1, 0·1)T

C = (0·95, 0·1)T.

Vector C is very close to vector B, and both form an angle of approximately 6°

with vector A. However, while the angle between the vector sums (A + B) and

(A + C) is only about 0.07°, the angle between (B – A) and (C – A) is greater

than 26°. On the other hand, the set of vectors

A = (1, 0)T

D = (0, 1)T

E = (0, 0·95)T

gives angles between (A + D) and (A + E) and between (D – A) and (E – A) of

approximately 1·5°. In summary, the sum of collinear vectors is well determined,

the difference is not. Both the sum and difference of vectors which are not

collinear are well determined.

30

Home

Next

Singular-value decomposition, and use in least-squares problems

31

3.2. A SINGULAR-VALUE DECOMPOSITION ALGORITHM

It may seem odd that the first algorithm to be described in this work is designed to

compute the singular-value decomposition (svd) of a matrix. Such computations are

topics well to the back of most books on numerical linear algebra. However, it was

the algorithm below which first interested the author in the capabilities of small

computers. Moreover, while the svd is somewhat of a sledgehammer method for

many nutshell problems, its versatility in finding the eigensolutions of a real

symmetric matrix, in solving sets of simultaneous linear equations or in computing

minimum-length solutions to least-squares problems makes it a valuable building

block in programs used to tackle a variety of real problems.

This versatility has been exploited in a single small program suite of approximately

300 lines of BASIC code to carry out the above problems as well as to find inverses

and generalised inverses of matrices and to solve nonlinear least-squares problems

(Nash 1984b, 1985).

The mathematical problem of the svd has already been stated in §2.5. However,

for computational purposes, an alternative viewpoint is more useful. This considers the possibility of finding an orthogonal matrix V, n by n, which transforms the

real m by n matrix A into another real m by n matrix B whose columns are

orthogonal. That is, it is desired to find V such that

B = AV = (bl, b2, . . . , bn)

(3.1)

where

(3.2)

and

VVT = VT V = 1n .

(3.3)

The Kronecker delta takes values

? ij =

{ 10

for i j

for i = j.

(3.4)

The quantities Si may, as yet, be either positive or negative, since only their

square is defined by equation (3.2). They will henceforth be taken arbitrarily as

positive and will be called singular values of the matrix A. The vectors

uj = bj/Sj

(3.5)

which can be computed when none of the Sj is zero, are unit orthogonal vectors.

Collecting these vectors into a real m by n matrix, and the singular values into a

diagonal n by n matrix, it is possible to write

B = US

(3.6)

UT U = 1 n

(3.7)

where

is a unit matrix of order n.

In the case that some of the Sj are zero, equations (3.1) and (3.2) are still valid,

but the columns of U corresponding to zero singular values must now be

32

Compact numerical methods for computers

constructed such that they are orthogonal to the columns of U computed via

equation (3.5) and to each other. Thus equations (3.6) and (3.7) are also satisfied.

An alternative approach is to set the columns of U corresponding to zero singular

values to null vectors. By choosing the first k of the singular values to be the

non-zero ones, which is always possible by simple permutations within the matrix

V, this causes the matrix UT U to be a unit matrix of order k augmented to order n

with zeros. This will be written

(3.8)

While not part of the commonly used definition of the svd, it is useful to require

the singular values to be sorted, so that

S11 > S22 > S33 > . . . > Skk > . . . > Snn.

This allows (2.53) to be recast as a summation

(2.53a)

Partial sums of this series give a sequence of approximations

? 1, ?2, . . . , ?n .

where, obviously, the last member of the sequence

?n = A

since it corresponds to a complete reconstruction of the svd. The rank-one matrices

u j S jj v T j

can be referred to as singular planes, and the partial sums (in order of decreasing

singular values) are partial svds (Nash and Shlien 1987).

A combination of (3.1) and (3.6) gives

AV = US

(3.9)

or, using (3.3), the orthogonality of V,

A = USVT

(2.53)

which expresses the svd of A.

The preceding discussion is conditional on the existence and computability of a

suitable matrix V. The next section shows how this task may be accomplished.

3.3. ORTHOGONALISATION BY PLANE ROTATIONS

The matrix V sought to accomplish the orthogonalisation (3.1) will be built up as

Singular-value decomposition, and use in least-squares problems

33

a product of simpler matrices

(3.10)

where z is some index not necessarily related to the dimensions m and n of A, the

matrix being decomposed. The matrices used in this product will be plane

rotations. If V (k) is a rotation of angle ? in the ij plane, then all elements of V( k )

will be the same as those in a unit matrix of order n except for

(3.11)

Thus V (k) affects only two columns of any matrix it multiplies from the right.

These columns will be labelled x and y. Consider the effect of a single rotation

involving these two columns

(3.12)

Thus we have

X = x cos ? + y sin ?

Y = –x sin ? + y cos ?.

(3.13)

If the resulting vectors X and Y are to be orthogonal, then

XT Y = 0 = –(x T x – yT y) sin? cos? + x T y(cos2 ? – sin 2? ).

(3.14)

There is a variety of choices for the angle ?, or more correctly for the sine and

cosine of this angle, which satisfy (3.14). Some of these are mentioned by

Hestenes (1958), Chartres (1962) and Nash (1975). However, it is convenient if

the rotation can order the columns of the orthogonalised matrix B by length, so

that the singular values are in decreasing order of size and those which are zero

(or infinitesimal) are found in the lower right-hand corner of the matrix S as in

equation (3.8). Therefore, a further condition on the rotation is that

XT X – xT x > 0.

(3.15)

For convenience, the columns of the product matrix

(3.16)

will be donated ai, i = 1, 2, . . . , n. The progress of the orthogonalisation is then

observable if a measure Z of the non-orthogonality is defined

(3.17)

Since two columns orthogonalised in one rotation may be made non-orthogonal in

subsequent rotations, it is essential that this measure be reduced at each rotation.

34

Compact numerical methods for computers

Because only two columns are involved in the kth rotation, we have

Z(k) = Z(k-1) + (XT Y)2 – (x T y) 2 .

(3.18)

But condition (3.14) implies

Z (k) = Z (k-1) – (x T y) 2

(3.19)

so that the non-orthogonality is reduced at each rotation.

The specific formulae for the sine and cosine of the angle of rotation are (see

e.g. Nash 1975) given in terms of the quantities

and

p = xT y

q = xT x – y T y

(3.20)

(3.21)

v = (4p2 + q2) ? .

(3.22)

They are

cos ? = [(v + q)/(2v ) ] ?

sin ? = p/(v cos ? )

for q > 0

sin ? = sgn(p)[(v – q)/(2v ) ]

cos ? = p/(? sin ? )

?

for q < 0

(3.23)

(3.24)

(3.25)

(3.26)

where

}

1

sgn (p) = –1

for p > 0

for p < 0.

(3.27)

Note that having two forms for the calculation of the functions of the angle of

rotation permits the subtraction of nearly equal numbers to be avoided. As the

matrix nears orthogonality p will become small, so that q and v are bound to have

nearly equal magnitudes.

In the first edition of this book, I chose to perform the computed rotation only

when q > r, and to use

sin ( ? ) = 1

cos ( ? ) = 0

(3.28)

when q < 0. This effects an interchange of the columns of the current matrix A.

However, I now believe that it is more efficient to perform the rotations as defined in

the code presented. The rotations (3.28) were used to force nearly null columns of the

final working matrix to the right-hand side of the storage array. This will occur when

the original matrix A suffers from linear dependencies between the columns (that is,

is rank deficient). In such cases, the rightmost columns of the working matrix

eventually reflect the lack of information in the data in directions corresponding to

the null space of the matrix A. The current methods cannot do much about this lack

of information, and it is not sensible to continue computations on these columns. In

the current implementation of the method (Nash and Shlien 1987), we prefer to

ignore columns at the right of the working matrix which become smaller than a

Singular-value decomposition, and use in least-squares problems

35

specified tolerance. This has a side effect of speeding the calculations significantly

when rank deficient matrices are encountered.

3.4. A FINE POINT

Equations (3.15) and (3.19) cause the algorithm just described obviously to

proceed towards both an orthogonalisation and an ordering of the columns of the

resulting matrix A(z). However the rotations must be arranged in some sequence

to carry this task to completion. Furthermore, it remains to be shown that some

sequences of rotations will not place the columns in disorder again. For suppose

a1 is orthogonal to all other columns and larger than any of them individually. A

sequential arrangement of the rotations to operate first on columns (1, 2), then

(1, 3), (1, 4), . . . , (1, n), followed by (2, 3), . . . , (2, n), (3, 4), . . . , ((n – 1), n) will

be called a cycle or sweep. Such a sweep applied to the matrix described can easily

yield a new a2 for which

(3.29)

if, for instance, the original matrix has a2 = a3 and the norm of these vectors is

greater than 2-? times the norm of a1. Another sweep of rotations will put

things right in this case by exchanging a1 and a2. However, once two columns

have achieved a separation related in a certain way to the non-orthogonality

measure (3.17), it can be shown that no subsequent rotation can exchange them.

Suppose that the algorithm has proceeded so far that the non-orthogonality

measure Z satisfies the inequality

Z < t2

(3.30)

where t is some positive tolerance. Then, for any subsequent rotation the

parameter p, equation (3.21), must obey

p2 < t2.

(3.31)

Suppose that all adjacent columns are separated in size so that

(3.32)

Then a rotation which changes ak (but not ak-1) cannot change the ordering of

the two columns. If x = ak, then straightforward use of equations (3.23) and (3.24)

or (3.25) and (3.26) gives

XT X – x T x = (v – q)/2 > 0.

(3.33)

Using (3.31) and (3.22) in (3.33) gives

(3.34)

Thus, once columns become sufficiently separated by size and the nonorthogonality sufficiently diminished, the column ordering is stable. When some

columns are equal in norm but orthogonal, the above theorem can be applied to

columns separated by size.

The general question of convergence in the case of equal singular values has been

36

Compact numerical methods for computers

investigated by T Hoy Booker (Booker 1985). The proof in exact arithmetic is

incomplete. However, for a method such as the algorithm presented here, which uses

tolerances for zero, Booker has shown that the cyclic sweeps must eventually

terminate.

Algorithm 1. Singular-value decomposition

procedure NashSVD(nRow, nCo1: integer; {size of problem}

var W: wmatrix; {working matrix}

var Z: rvector); {squares of singular values}

(alg01.pas ==

form a singular value decomposition of matrix A which is stored in the

first nRow rows of working array W and the nCo1 columns of this array.

The first nRow rows of W will become the product U * S of a

conventional svd, where S is the diagonal matrix of singular values.

The last nCo1 rows of W will be the matrix V of a conventional svd.

On return, Z will contain the squares of the singular values. An

extended form of this commentary can be displayed on the screen by

removing the comment braces on the writeln statements below.

Copyright 1988 J. C. Nash

}

Var

i, j, k, EstColRank, RotCount, SweepCount, slimit : integer;

eps, e2, tol, vt, p, h2, x0, y0, q, r, c0, s0, c2, d1, d2 : real;

procedure rotate; (STEP 10 as a procedure}

(This rotation acts on both U and V, by storing V at the bottom of U}

begin (>}

end; { rotate }

begin { procedure SVD }

{ -- remove the comment braces to allow message to be displayed -writeln(‘Nash Singular Value Decomposition (NashSVD).’);

writeln;

writeln(‘The program takes as input a real matrix A.’);

writeln;

writeln(‘Let U and V be orthogonal matrices, & S’);

writeln(‘a diagonal matrix, such that U” A V = S.’);

writeln(‘Then A = U S V” is the decomposition.’);

writeln(‘A is assumed to have more rows than columns. If it’);

writeln(‘does not, the svd of the transpose A” gives the svd’);

writeln(‘of A, since A” = V S U”.’);

writeln;

writeln(‘If A has nRow rows and nCo1

Next

COMPACT NUMERICAL

METHODS

FOR COMPUTERS

linear algebra and

function minimisation

Second Edition

J C NASH

Adam Hilger, Bristol and New York

Copyright © 1979, 1990 J C Nash

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system

or transmitted in any form or by any means, electronic, mechanical, photocopying or

otherwise, without the prior permission of the publisher. Multiple copying is only permitted

under the terms of the agreement between the Committee of Vice-Chancellors and Principals

and the Copyright Licensing Agency.

British Library Cataloguing in Publication Data

Nash, J. C.

Compact numerical methods for computers: linear algebra

and function minimisation - 2nd ed.

1!. Numerical analysis. Applications of microcomputer &

minicomputer systems. Algorithms

I. Title

519.4

ISBN

ISBN

ISBN

ISBN

0-85274-318-1

0-85274-319-X (pbk)

0-7503-0036-1 (5?" IBM disc)

0-7503-0043-4 (3?" IBM disc)

Library of Congress Cataloging-in-Publication Data are available

First published, 1979

Reprinted, 1980

Second edition, 1990

Published under the Adam Hilger imprint by IOP Publishing Ltd

Techno House, Redcliffe Way, Bristol BSl 6NX, England

335 East 45th Street, New York, NY 10017-3483, USA

Filmset by Bath Typesetting Ltd, Bath, Avon

Printed in Great Britain by Page Bros (Norwich) Ltd

CONTENTS

ix

xi

Preface to the Second Edition

Preface to the First Edition

1. A STARTING POINT

1.1. Purpose and scope

1.2. Machine characteristics

1.3. Sources of programs

1.4. Programming languages used and structured programming

1.5. Choice of algorithms

1.6. A method for expressing algorithms

1.7. General notation

1.8. Software engineering issues

1

1

3

9

11

13

15

17

17

2. FORMAL PROBLEMS IN LINEAR ALGEBRA

2.1. Introduction

2.2. Simultaneous linear equations

2.3. The linear least-squares problem

2.4. The inverse and generalised inverse of a matrix

2.5. Decompositions of a matrix

2.6. The matrix eigenvalue problem

19

19

19

21

24

26

28

3. THE SINGULAR-VALUE DECOMPOSITION AND ITS USE

TO SOLVE LEAST-SQUARES PROBLEMS

3.1. Introduction

3.2. A singular-value decomposition algorithm

3.3. Orthogonalisation by plane rotations

3.4. A fine point

3.5. An alternative implementation of the singular-value decomposition

3.6. Using the singular-value decomposition to solve least-squares

problems

30

30

31

32

35

38

40

4. HANDLING LARGER PROBLEMS

4.1. Introduction

4.2. The Givens’ reduction

4.3. Extension to a singular-value decomposition

4.4. Some labour-saving devices

4.5. Related calculations

49

49

49

54

54

63

5. SOME COMMENTS ON THE FORMATION OF THE CROSSPRODUCTS MATRIX ATA

66

v

vi

Compact numerical methods for computers

6. LINEAR EQUATIONS-A DIRECT APPROACH

6.1. Introduction

6.2. Gauss elimination

6.3. Variations on the theme of Gauss elimination

6.4. Complex systems of equations

6.5. Methods for special matrices

72

72

72

80

82

83

7. THE CHOLESKI DECOMPOSITION

7.1. The Choleski decomposition

7.2. Extension of the Choleski decomposition to non-negative definite matrices

7.3. Some organisational details

84

84

8. THE SYMMETRIC POSITIVE DEFINITE MATRIX AGAIN

8.1. The Gauss-Jordan reduction

8.2. The Gauss-Jordan algorithm for the inverse of a symmetric

positive definite matrix

94

94

9. THE ALGEBRAIC EIGENVALUE PROBLEM

9.1. Introduction

9.2. The power method and inverse iteration

9.3. Some notes on the behaviour of inverse iteration

9.4. Eigensolutions of non-symmetric and complex matrices

10. REAL SYMMETRIC MATRICES

10.1. The eigensolutions of a real symmetric matrix

10.2. Extension to matrices which are not positive definite

10.3. The Jacobi algorithm for the eigensolutions of a real symmetric

matrix

10.4. Organisation of the Jacobi algorithm

10.5. A brief comparison of methods for the eigenproblem of a real

symmetric matrix

11. THE GENERALISED SYMMETRIC MATRIX EIGENVALUE

PROBLEM

86

90

97

102

102

102

108

110

119

119

121

126

128

133

135

142

12. OPTIMISATION AND NONLINEAR EQUATIONS

12.1. Formal problems in unconstrained optimisation and nonlinear

equations

12.2. Difficulties encountered in the solution of optimisation and

nonlinear-equation problems

142

13. ONE-DIMENSIONAL PROBLEMS

13.1. Introduction

13.2. The linear search problem

13.3. Real roots of functions of one variable

148

148

148

160

146

Contents

14. DIRECT SEARCH METHODS

14.1. The Nelder-Mead simplex search for the minimum of a

function of several parameters

14.2. Possible modifications of the Nelder-Mead algorithm

14.3. An axial search procedure

14.4. Other direct search methods

vii

168

168

172

178

182

15. DESCENT TO A MINIMUM I: VARIABLE METRIC

ALGORITHMS

15.1. Descent methods for minimisation

15.2. Variable metric algorithms

15.3. A choice of strategies

186

186

187

190

16. DESCENT TO A MINIMUM II: CONJUGATE GRADIENTS

16.1. Conjugate gradients methods

16.2. A particular conjugate gradients algorithm

17. MINIMISING A NONLINEAR SUM OF SQUARES

17.1. Introduction

17.2. Two methods

17.3. Hartley’s modification

17.4. Marquardt’s method

17.5. Critique and evaluation

17.6. Related methods

197

197

198

207

207

208

210

211

212

215

18. LEFT-OVERS

18.1. Introduction

18.2. Numerical approximation of derivatives

18.3. Constrained optimisation

18.4. A comparison of function minimisation and nonlinear leastsquares methods

218

218

218

221

19. THE CONJUGATE GRADIENTS METHOD APPLIED TO

PROBLEMS IN LINEAR ALGEBRA

19.1. Introduction

19.2. Solution of linear equations and least-squares problems by

conjugate gradients

19.3. Inverse iteration by algorithm 24

19.4. Eigensolutions by minimising the Rayleigh quotient

226

234

234

235

241

243

APPENDICES

1. Nine test matrices

2. List of algorithms

3. List of examples

4. Files on the software diskette

253

253

255

256

258

BIBLIOGRAPHY

263

INDEX

271

PREFACE TO THE SECOND EDITION

The first edition of this book was written between 1975 and 1977. It may come as a

surprise that the material is still remarkably useful and applicable in the solution of

numerical problems on computers. This is perhaps due to the interest of researchers

in the development of quite complicated computational methods which require

considerable computing power for their execution. More modest techniques have

received less time and effort of investigators. However, it has also been the case that

the algorithms presented in the first edition have proven to be reliable yet simple.

The need for simple, compact numerical methods continues, even as software

packages appear which relieve the user of the task of programming. Indeed, such

methods are needed to implement these packages. They are also important when

users want to perform a numerical task within their own programs.

The most obvious difference between this edition and its predecessor is that the

algorithms are presented in Turbo Pascal, to be precise, in a form which will operate

under Turbo Pascal 3.01a. I decided to use this form of presentation for the following

reasons:

(i) Pascal is quite similar to the Step-and-Description presentation of algorithms

used previously;

(ii) the codes can be typeset directly from the executable Pascal code, and the

very difficult job of proof-reading and correction avoided;

(iii) the Turbo Pascal environment is very widely available on microcomputer

systems, and a number of similar systems exist.

Section 1.6 and appendix 4 give some details about the codes and especially the

driver and support routines which provide examples of use.

The realization of this edition was not totally an individual effort. My research

work, of which this book represents a product, is supported in part by grants from

the Natural Sciences and Engineering Research Council of Canada. The Mathematics Department of the University of Queensland and the Applied Mathematics

Division of the New Zealand Department of Scientific and Industrial Research

provided generous hospitality during my 1987-88 sabbatical year, during which a

great part of the code revision was accomplished. Thanks are due to Mary WalkerSmith for reading early versions of the codes, to Maureen Clarke of IOP Publishing

Ltd for reminders and encouragement, and to the Faculty of Administration of the

University of Ottawa for use of a laser printer to prepare the program codes. Mary

Nash has been a colleague and partner for two decades, and her contribution to this

project in many readings, edits, and innumerable other tasks has been a large one.

In any work on computation, there are bound to be errors, or at least program

ix

x

Compact numerical methods for computers

structures which operate in unusual ways in certain computing environments. I

encourage users to report to me any such observations so that the methods may be

improved.

J. C. Nash

Ottawa, 12 June 1989

PREFACE TO THE FIRST EDITION

This book is designed to help people solve numerical problems. In particular, it is

directed to those who wish to solve numerical problems on ‘small’ computers, that

is, machines which have limited storage in their main memory for program and

data. This may be a programmable calculator-even a pocket model-or it may

be a subsystem of a monster computer. The algorithms that are presented in the

following pages have been used on machines such as a Hewlett-Packard 9825

programmable calculator and an IBM 370/168 with Floating Point Systems Array

Processor. That is to say, they are designed to be used anywhere that a problem

exists for them to attempt to solve. In some instances, the algorithms will not be

as efficient as others available for the job because they have been chosen and

developed to be ‘small’. However, I believe users will find them surprisingly

economical to employ because their size and/or simplicity reduces errors and

human costs compared with equivalent ‘larger’ programs.

Can this book be used as a text to teach numerical methods? I believe it can.

The subject areas covered are, principally, numerical linear algebra, function

minimisation and root-finding. Interpolation, quadrature and differential equations are largely ignored as they have not formed a significant part of my own

work experience. The instructor in numerical methods will find perhaps too few

examples and no exercises. However, I feel the examples which are presented

provide fertile ground for the development of many exercises. As much as

possible, I have tried to present examples from the real world. Thus the origins of

the mathematical problems are visible in order that readers may appreciate that

these are not merely interesting diversions for those with time and computers

available.

Errors in a book of this sort, especially in the algorithms, can depreciate its

value severely. I would very much appreciate hearing from anyone who discovers

faults and will do my best to respond to such queries by maintaining an errata

sheet. In addition to the inevitable typographical errors, my own included, I

anticipate that some practitioners will take exception to some of the choices I

have made with respect to algorithms, convergence criteria and organisation of

calculations. Out of such differences, I have usually managed to learn something

of value in improving my subsequent work, either by accepting new ideas or by

being reassured that what I was doing had been through some criticism and had

survived.

There are a number of people who deserve thanks for their contribution to this

book and who may not be mentioned explicitly in the text:

(i) in the United Kingdom, the many members of the Numerical Algorithms

Group, of the Numerical Optimization Centre and of various university departments with whom I discussed the ideas from which the algorithms have condensed;

xi

xii

Compact numerical methods for computers

(ii) in the United States, the members of the Applied Mathematics Division of the

Argonne National Laboratory who have taken such an interest in the algorithms,

and Stephen Nash who has pointed out a number of errors and faults; and

(iii) in Canada, the members of the Economics Branch of Agriculture Canada for

presenting me with such interesting problems to solve, Kevin Price for careful and

detailed criticism, Bob Henderson for trying out most of the algorithms, Richard

Wang for pointing out several errors in chapter 8, John Johns for trying (and

finding errors in) eigenvalue algorithms, and not least Mary Nash for a host of

corrections and improvements to the book as a whole.

It is a pleasure to acknowledge the very important roles of Neville Goodman

and Geoff Amor of Adam Hilger Ltd in the realisation of this book.

J. C. Nash

Ottawa, 22 December 1977

Previous Home Next

Chapter 1

A STARTING POINT

1.1. PURPOSE AND SCOPE

This monograph is written for the person who has to solve problems with (small)

computers. It is a handbook to help him or her obtain reliable answers to specific

questions, posed in a mathematical way, using limited computational resources.

To this end the solution methods proposed are presented not only as formulae but

also as algorithms, those recipes for solving problems which are more than merely

a list of the mathematical ingredients.

There has been an attempt throughout to give examples of each type of

calculation and in particular to give examples of cases which are prone to upset

the execution of algorithms. No doubt there are many gaps in the treatment

where the experience which is condensed into these pages has not been adequate

to guard against all the pitfalls that confront the problem solver. The process of

learning is continuous, as much for the teacher as the taught. Therefore, the user

of this work is advised to think for him/herself and to use his/her own knowledge and

familiarity of particular problems as much as possible. There is, after all, barely a

working career of experience with automatic computation and it should not seem

surprising that satisfactory methods do not exist as yet for many problems. Throughout the sections which follow, this underlying novelty of the art of solving numerical

problems by automatic algorithms finds expression in a conservative design policy.

Reliability is given priority over speed and, from the title of the work, space

requirements for both the programs and the data are kept low.

Despite this policy, it must be mentioned immediately and with some

emphasis that the algorithms may prove to be surprisingly efficient from a

cost-of-running point of view. In two separate cases where explicit comparisons

were made, programs using the algorithms presented in this book cost less to

run than their large-machine counterparts. Other tests of execution times for

algebraic eigenvalue problems, roots of a function of one variable and function

minimisation showed that the eigenvalue algorithms were by and large ‘slower’

than those recommended for use on large machines, while the other test problems

were solved with notable efficiency by the compact algorithms. That ‘small’

programs may be more frugal than larger, supposedly more efficient, ones based

on different algorithms to do the same job has at least some foundation in the way

today’s computers work.

Since the first edition of this work appeared, a large number and variety of

inexpensive computing machines have appeared. Often termed the ‘microcomputer

revolution’, the widespread availability of computing power in forms as diverse as

programmable calculators to desktop workstations has increased the need for

1

2

Compact numerical methods for computers

suitable software of all types. including numerical methods. The present work is

directed at the user who needs, for whatever reason, to program a numerical method

to solve a problem. While software packages and libraries exist to provide for the

solution of numerical problems, financial, administrative or other obstacles may

render their use impossible or inconvenient. For example, the programming tools

available on the chosen computer may not permit the packaged software to be used.

Firstly, most machines are controlled by operating systems which control (and

sometimes charge for) the usage of memory, storage, and other machine resources. In

both compilation (translation of the program into machine code) and execution, a

smaller program usually will make smaller demands on resources than a larger one.

On top of this, the time of compilation is usually related to the size of the source

code.

Secondly, once the program begins to execute, there are housekeeping operations

which must be taken care of:

(i) to keep programs and data belonging to one task or user separate from those

belonging to others in a time-sharing environment, and

(ii) to access the various parts of the program and data within the set of

resources allocated to a single user.

Studies conducted some years ago by Dr Maurice Cox of the UK National

Physical Laboratory showed that (ii) requires about 90% of the time a computer

spends with a typical scientific computation. Only about 10% of the effort goes to

actual arithmetic. This mix of activity will vary greatly with the machine and problem

under consideration. However, it is not unreasonable that a small program can use

simpler structures, such as address maps and decision tables, than a larger routine. It

is tempting to suggest that the computer may be able to perform useful work with a

small program while deciding what to do with a larger one. Gathering specific

evidence to support such conjectures requires the fairly tedious work of benchmarking. Moreover, the results of the exercise are only valid as long as the machine,

operating system, programming language translators and programs remain

unchanged. Where performance is critical, as in the case of real-time computations,

for example in air traffic control, then benchmarking will be worthwhile. In other

situations, it will suffice that programs operate correctly and sufficiently quickly that

the user is not inconvenienced.

This book is one of the very few to consider algorithms which have very low

storage requirements. The first edition appeared just as programmable calculators

and the first microcomputers were beginning to make their presence felt. These

brought to the user’s desk a quantum improvement in computational power.

Comparisons with the earliest digital computers showed that even a modest microcomputer was more powerful. It should be noted, however, that the programmer did

not have to handle all the details of arithmetic and data storage, as on the early

computers, thanks to the quick release of programming language translators. There

is unfortunately still a need to be vigilant for errors in the floating-point arithmetic

and the special function routines. Some aids to such checking are mentioned later in

§1.2.

Besides the motivation of cost savings or the desire to use an available and

A starting point

3

possibly under-utilised small computer, this work is directed to those who share

my philosophy that human beings are better able to comprehend and deal with

small programs and systems than large ones. That is to say, it is anticipated that

the costs involved in implementing, modifying and correcting a small program will

be lower for small algorithms than for large ones, though this comparison will

depend greatly on the structure of the algorithms. By way of illustration, I

implemented and tested the eigenvalue/vector algorithm (algorithm 13) in under

half an hour from a 10 character/second terminal in Aberystwyth using a Xerox

Sigma 9 computer in Birmingham. The elapsed time includes my instruction in the

use of the system which was of a type I had not previously encountered. I am

grateful to Mrs Lucy Tedd for showing me this system. Dr John Johns of the

Herzberg Institute of Astrophysics was able to obtain useful eigensolutions from

the same algorithm within two hours of delivery of a Hewlett-Packard 9825

programmable calculator. He later discovered a small error in the prototype of

the algorithm.

The topics covered in this work are numerical linear algebra and function

minimisation. Why not differential equations? Quite simply because I have had

very little experience with the numerical solution of differential equations except

by techniques using linear algebra or function minimisation. Within the two broad

areas, several subjects are given prominence. Linear equations are treated in

considerable detail with separate methods given for a number of special situations.

The algorithms given here are quite similar to those used on larger machines. The

algebraic eigenvalue problem is also treated quite extensively, and in this edition, a

method for complex matrices is included. Computing the eigensolutions of a general

square matrix is a problem with many inherent difficulties, but we shall not dwell on

these at length.

Constrained optimisation is still a topic where I would be happier to offer more

material, but general methods of sufficient simplicity to include in a handbook of this

sort have eluded my research efforts. In particular, the mathematical programming

problem is not treated here.

Since the aim has been to give a problem-solving person some tools with which

to work, the mathematical detail in the pages that follow has been mostly confined

to that required for explanatory purposes. No claim is made to rigour in any

‘proof’, though a sincere effort has been made to ensure that the statement of

theorems is correct and precise.

1.2. MACHINE CHARACTERISTICS

In the first edition, a ‘small computer’ was taken to have about 6000 characters of

main memory to hold both programs and data. This logical machine, which might be

a part of a larger physical computer, reflected the reality facing a quite large group of

users in the mid- to late-1970s.

A more up-to-date definition of ‘small computer’ could be stated, but it is not

really necessary. Users of this book are likely to be those scientists, engineers, and

statisticians who must, for reasons of financial or administrative necessity or

convenience, carry out their computations in environments where the programs

4

Compact numerical methods for computers

cannot be acquired simply and must, therefore, be written in-house. There are also a

number of portable computers now available. This text is being entered on a Tandy

Radio Shack TRS-80 Model 100, which is only the size of a large book and is

powered by four penlight batteries.

Users of the various types of machines mentioned above often do not have much

choice as to the programming tools available. On ‘borrowed’ computers, one has to

put up with the compilers and interpreters provided by the user who has paid for the

resource. On portables, the choices may be limited by the decisions of the manufacturer. In practice, I have, until recently, mostly programmed in BASIC, despite its

limitations, since it has at least been workable on most of the machines available to

me.

Another group of users of the material in this book is likely to be software

developers. Some scenarios which have occurred are:

—software is being developed in a particular computing environment (e.g. LISP

for artificial intelligence) and a calculation is required for which suitable off-the-shelf

routines are not available;

—standard routines exist but when linked into the package cause the executable

code to be too large for the intended disk storage or memory;

—standard routines exist, but the coding is incompatible with the compiler or

interpreter at hand.

It is worth mentioning that programming language standards have undergone

considerable development in the last decade. Nevertheless, the goal of portable

source codes of numerical methods is still only won by careful and conservative

programming practices.

Because of the emphasis on the needs of the user to program the methods, there is

considerable concern in this book to keep the length and complexity of the

algorithms to a minimum. Ideally, I like program codes for the algorithms to be no

longer than a page of typed material, and at worse, less than three pages. This makes

it feasible to implement the algorithms in a single work session. However, even this

level of effort is likely to be considered tedious and it is unnecessary if the code can be

provided in a suitable form. Here we provide source code in Turbo Pascal for the

algorithms in this book and for the driver and support routines to run them (under

Turbo Pascal version 3.01a).

The philosophy behind this book remains one of working with available tools

rather than complaining that better ones exist, albeit not easily accessible. This

should not lead to complacency in dealing with the machine but rather to an active

wariness of any and every feature of the system. A number of these can and should be

checked by using programming devices which force the system to reveal itself in spite

of the declarations in the manual(s). Others will have to be determined by exploring

every error possibility when a program fails to produce expected results. In most

cases programmer error is to blame, but I have encountered at least one system error

in each of the systems I have used seriously. For instance, trigonometric functions are

usually computed by power series approximation. However, these approximations

have validity over specified domains, usually [0, ? /2] or [0, ? /2] (see Abramowitz and

Stegun 1965, p 76). Thus the argument of the function must first be transformed to

A starting point

5

bring it into the appropriate range. For example

or

sin( ? – ? ) = sin

(1.1)

sin( ? /2 – ? ) = cos

(1.2)

Unless this range reduction is done very carefully the results may be quite

unexpected. On one system, hosted by a Data General NOVA, I have observed

that the sine of an angle near ? and the cosine of an angle near ?/2 were both

computed as unity instead of a small value, due to this type of error. Similarly, on

some early models of Hewlett- Packard pocket calculators, the rectangular-to-polar

coordinate transformation may give a vector 180° from the correct direction. (This

appears to have been corrected now.)

Testing the quality of the floating-point arithmetic and special functions is

technically difficult and tedious. However, some developments which aid the user

have been made by public-spirited scientists. Of these, I consider the most worthy

example to be PARANOIA, a program to examine the floating-point arithmetic

provided by a programming language translator. Devised originally by Professor W

Kahan of the University of California, Berkeley, it has been developed and distributed in a number of programming languages (Karpinski 1985). Its output is

didactic, so that one does not have to be a numerical analyst to interpret the results. I

have used the BASIC, FORTRAN, Pascal and C versions of PARANOIA, and have seen

reports of Modula-2 and ADA®† versions.

In the area of special functions, Cody and Waite (1980) have devised software to

both calculate and test a number of the commonly used transcendental functions

(sin, cos, tan, log, exp, sqrt, xy). The ELEFUNT testing software is available in their

book, written in FORTRAN. A considerable effort would be needed to translate it into

other programming languages.

An example from our own work is the program DUBLTEST, which is designed to

determine the precision to which the standard special functions in BASIC are

computed (Nash and Walker-Smith 1987). On the IBM PC, many versions of

Microsoft BASIC (GWBASIC, BASICA) would only compute such functions in single

precision, even if the variables involved as arguments or results were double

precision. For some nonlinear parameter estimation problems, the resulting low

precision results caused unsatisfactory operation of our software.

Since most algorithms are in some sense iterative, it is necessary that one has

some criterion for deciding when sufficient progress has been made that the

execution of a program can be halted. While, in general, I avoid tests which

require knowledge of the machine, preferring to use the criterion that no progress

has been made in an iteration, it is sometimes convenient or even necessary to

employ tests involving tolerances related to the structure of the computing device

at hand.

The most useful property of a system which can be determined systematically is

the machine precision. This is the smallest number, eps, such that

1+eps>1

† ADA is a registered name of the US Department of Defense.

(1.3)

6

Compact numerical methods for computers

within the arithmetic of the system. Two programs in FORTRAN for determining the

machine precision, the radix or base of the arithmetic, and machine rounding or

truncating properties have been given by Malcolm (1972). The reader is cautioned

that, since these programs make use of tests of conditions like (1.3), they may be

frustrated by optimising compilers which are able to note that (1.3) in exact

arithmetic is equivalent to

eps>0.

(1.4)

Condition (1.4) is not meaningful in the present context. The Univac compilers

have acquired some notoriety in this regard, but they are by no means the only

offenders.

To find the machine precision and radix by using arithmetic of the computer

itself, it is first necessary to find a number q such that (1 + q ) and q are

represented identically, that is, the representation of 1 having the same exponent

as q has a digit in the (t + 1)th radix position where t is the number of radix digits

in the floating-point mantissa. As an example, consider a four decimal digit

machine. If q = 10,000 or greater, then q is represented as (say)

0.1 * 1E5

while 1 is represented as

0·00001 * 1E5.

The action of storing the five-digit sum

0·10001 * 1E5

in a four-digit word causes the last digit to be dropped. In the example,

q = 10 000 is the smallest number which causes (1 + q) and q to be represented

identically, but any number

q > 9999

will have the same property. If the machine under consideration has radix R, then

any

q > Rt

(1.5)

t +1

will have the desired property. If, moreover, q and R are represented so that

then

q < Rt

+1

q+R>q.

(1.6)

(1.7)

In our example, R = 10 and t = 4 so the largest q consistent with (1.6) is

q = 105-10 = 99 990 = 0·9999 * 1E5

and

99 990 + 10 = 100 000 = 0·1000 * 1E6 > q.

Starting with a trial value, say q = 1, successive doubling will give some number

q = 2k

A starting point

7

such that (q + 1) and q are represented identically. By then setting r to successive

integers 2, 3, 4 , . . . , a value such that

q+r>q

(1.8)

will be found. On a machine which truncates, r is then the radix R. However, if

the machine rounds in some fashion, the condition (1.8) may be satisfied for r < R.

Nevertheless, the representations of q and (q + r) will differ by R. In the example,

doubling will produce q = 16 384 which will be represented as

0·1638 * 1E5

so q + r is represented as

0·1639 * 1E5

for some r 10. Then subtraction of these gives

0·0001 * 1E5 = 10.

Unfortunately, it is possible to foresee situations where this will not work.

Suppose that q = 99 990, then we have

0·9999 * 1E5 + 10 = 0·1000 * 1E6

and

0·1000 * 1E6–0·9999 * 1E5 = R'.

But if the second number in this subtraction is first transformed to

0·0999 * 1E6

then R? is assigned the value 100. Successive doubling should not, unless the

machine arithmetic is extremely unusual, give q this close to the upper bound of

(1.6).

Suppose that R has been found and that it is greater than two. Then if the

representation of q + (R – 1) is greater than that of q, the machine we are using

rounds, otherwise it chops or truncates the results of arithmetic operations.

The number of radix digits t is now easily found as the smallest integer such

that

Rt + 1

is represented identically to Rt. Thus the machine precision is given as

eps = R1-t = R-(t-1).

(1.9)

In the example, R = 10, t = 4, so

R-3 = 0·001.

Thus

1 + 0·00l = 1·001 > 1

but 1 + 0·0009 is, on a machine which truncates, represented as 1.

In all of the previous discussion concerning the computation of the machine

precision it is important that the representation of numbers be that in the

8

Compact numerical methods for computers

memory, not in the working registers where extra digits may be carried. On a

Hewlett-Packard 9830, for instance, it was necessary when determining the

so-called ‘split precision’ to store numbers specifically in array elements to force

the appropriate truncation.

The above discussion has assumed a model of floating-point arithmetic which may

be termed an additive form in that powers of the radix are added together and the

entire sum multiplied by some power of the radix (the exponent) to provide the final

quantity representing the desired real number. This representation may or may not

be exact. For example, the fraction cannot be exactly represented in additive binary

(radix 2) floating-point arithmetic. While there are other models of floating-point

arithmetic, the additive form is the most common, and is used in the IEEE binary

and radix-free floating-point arithmetic standards. (The March, 1981, issue of IEEE

Computer magazine, volume 3, number 4, pages 51-86 contains a lucid description of

the binary standard and its motivations.)

If we are concerned with having absolute upper and lower bounds on computed

quantities, interval arithmetic is possible, but not commonly supported by programming languages (e.g. Pascal SC (Kulisch 1987)). Despite the obvious importance of

assured bounds on results, the perceived costs of using interval arithmetic have

largely prevented its widespread use.

The development of standards for floating-point arithmetic has the great benefit

that results of similar calculations on different machinery should be the same.

Furthermore, manufacturers have been prompted to develop hardware implementations of these standards, notably the Intel 80 x 87 family and the Motorola 68881

of circuit devices. Hewlett-- Packard implemented a decimal version of the IEEE 858

standard in their HP 71B calculator.

Despite such developments, there continues to be much confusion and misinformation concerning floating-point arithmetic. Because an additive decimal form of

arithmetic can represent fractions such as exactly, and in general avoid inputoutput conversion errors, developers of software products using such arithmetic

(usually in binary coded decimal or BCD form) have been known to claim that it has

'no round-off error', which is patently false. I personally prefer decimal arithmetic, in

that data entered into a calculation can generally be represented exactly, so that a

display of the stored raw data reproduces the input familiar to the user. Nevertheless,

the differences between good implementations of floating-point arithmetic, whether

binary or decimal, are rarely substantive.

While the subject of machine arithmetic is still warm, note that the mean of two

numbers may be calculated to be smaller or greater than either! An example in

four-figure decimal arithmetic will serve as an illustration of this.

a

b

a+b

(a + b) /2

Exact

Rounded

Truncated

5008

5007

10015

5007·5

5008

5007

1002 * 10

501·0 * 10

= 5010

5008

5007

1001 * 10

500·5 * 10

= 500.5

A starting point

9

That this can and does occur should be kept in mind whenever averages are

computed. For instance, the calculations are quite stable if performed as

(a + b)/2 = 5000+[(a – 5000) + (b – 5000)]/2.

Taking account of every eventuality of this sort is nevertheless extremely tedious.

Another annoying characteristic of small machines is the frequent absence of

extended precision, also referred to as double precision, in which extra radix digits

are made available for calculations. This permits the user to carry out arithmetic

operations such as accumulation, especially of inner products of vectors, and

averaging with less likelihood of catastrophic errors. On equipment which functions with number representations similar to the IBM/360 systems, that is, six

hexadecimal (R = 16) digits in the mantissa of each number, many programmers

use the so-called ‘double precision’ routinely. Actually t = 14, which is not double

six. In most of the calculations that I have been asked to perform, I have not

found such a sledgehammer policy necessary, though the use of this feature in

appropriate situations is extremely valuable. The fact that it does not exist on

most small computers has therefore coloured much of the development which

follows.

Finally, since the manufacturers’ basic software has been put in question above,

the user may also wonder about their various application programs and packages.

While there are undoubtedly some ‘good’ programs, my own experience is that the

quality of such software is very mixed. Badly written and poorly documented

programs may take longer to learn and understand than a satisfactory homegrown

product takes to code and debug. A major fault with many software products is that

they lack references to the literature and documentation of their pedigree and

authorship. Discussion of performance and reliability tests may be restricted to very

simple examples. Without such information, it may be next to impossible to

determine the methods used or the level of understanding of the programmer of the

task to be carried out, so that the user is unable to judge the quality of the product.

Some developers of mathematical and statistical software are beginning to recognise

the importance of background documentation, and their efforts will hopefully be

rewarded with sales.

1.3. SOURCES OF PROGRAMS

When the first edition of this book was prepared, there were relatively few sources of

mathematical software in general, and in essence none (apart from a few manufacturers’ offerings) for users of small computers. This situation has changed remarkably, with some thousands of suppliers. Source codes of numerical methods,

however, are less widely available, yet many readers of this book may wish to

conduct a search for a suitable program already coded in the programming language

to be used before undertaking to use one of the algorithms given later.

How should such a search be conducted? I would proceed as follows.

First, if FORTRAN is the programming language, I would look to the major

collections of mathematical software in the Collected Algorithms of the Association for

Computing Machinery (ACM). This collection, abbreviated as CALGO, is comprised

10

Compact numerical methods for computers

of all the programs published in the Communications of the ACM (up to 1975) and

the ACM Transactions on Mathematical Software (since 1975). Other important

collections are EISPACK, UNPACK, FUNPACK and MINPACK, which concern algebraic

eigenproblems, linear equations, special functions and nonlinear least squares minimisation problems. These and other packages are, at time of writing, available from

the Mathematics and Computer Sciences Division of the Argonne National Laboratory of the US Department of Energy. For those users fortunate enough to have

access to academic and governmental electronic mail networks, an index of software

available can be obtained by sending the message

SEND INDEX

to the pseudo-user NETLIB at node ANL-MCS on the ARPA network (Dongarra and

Grosse 1987). The software itself may be obtained by a similar mechanism.

Suppliers such as the Numerical Algorithms Group (NAG), International Mathematical and Statistical Libraries (IMSL), C Abaci, and others, have packages

designed for users of various computers and compilers, but provide linkable object

code rather than the FORTRAN source. C Abaci, in particular, allows users of the

Scientific Desk to also operate the software within what is termed a ‘problem solving

environment’ which avoids the need for programming.

For languages other than FORTRAN, less software is available. Several collections of

programs and procedures have been published as books, some with accompanying

diskettes, but once again, the background and true authorship may be lacking. The

number of truly awful examples of badly chosen, badly coded algorithms is alarming,

and my own list of these too long to include here.

Several sources I consider worth looking at are the following.

Maindonald (1984)

—A fairly comprehensive collection of programs in BASIC (for a Digital Equipment Corporation VAX computer) are presented covering linear estimation,

statistical distributions and pseudo-random numbers.

Nash and Walker-Smith (1987)

—Source codes in BASIC are given for six nonlinear minimisation methods and a

large selection of examples. The algorithms correspond, by and large, to those

presented later in this book.

LEQBO5 (Nash 1984b, 1985)

—This single ‘program’ module (actually there are three starting points for

execution) was conceived as a joke to show how small a linear algebra package

could be made. In just over 300 lines of BASIC is the capability to solve linear

equations, linear least squares, matrix inverse and generalised inverse, symmetric matrix eigenproblem and nonlinear least squares problems. The joke

back-fired in that the capability of this program, which ran on the Sinclair ZX81

computer among other machines, is quite respectable.

Kahaner, Moler and Nash (1989)

—This numerical analysis textbook includes FORTRAN codes which illustrate the

material presented. The authors have taken pains to choose this software for

A starting point

11

quality. The user must, however, learn how to invoke the programs, as there is

no user interface to assist in problem specification and input.

Press et al (1986) Numerical Recipes

—This is an ambitious collection of methods with wide capability. Codes are

offered in FORTRAN, Pascal, and C. However, it appears to have been only

superficially tested and the examples presented are quite simple. It has been

heavily advertised.

Many other products exist and more are appearing every month. Finding out

about them requires effort, the waste of which can sometimes be avoided by using

modern online search tools. Sadly, more effort is required to determine the quality of

the software, often after money has been spent.

Finally on sources of software, readers should be aware of the Association for

Computing Machinery (ACM) Transactions on Mathematical Software which publishes research papers and reports algorithms. The algorithms themselves are available after a delay of approximately 1 year on NETLIB and are published in full in the

Collected Algorithms of the ACM. Unfortunately, many are now quite large programs, and the Transactions on Mathematical Software (TOMS) usually only

publishes a summary of the codes, which is insufficient to produce a working

program. Moreover, the programs are generally in FORTRAN.

Other journals which publish algorithms in some form or other are Applied

Statistics (Journal of the Royal Statistical Society, Part C), the Society for Industrial

and Applied Mathematics (SIAM) journals on Numerical Analysis and on Scientific

and Statistical Computing, the Computer Journal (of the British Computer Society),

as well as some of the specialist journals in computational statistics, physics,

chemistry and engineering. Occasionally magazines, such as Byte or PC Magazine,

include articles with interesting programs for scientific or mathematical problems.

These may be of very variable quality depending on the authorship, but some

exceptionally good material has appeared in magazines, which sometimes offer the

codes in machine-readable form, such as the Byte Information Exchange (BIX) and

disk ordering service. The reader has, however, to be assiduous in verifying the

quality of the programs.

1.4. PROGRAMMING LANGUAGES USED AND STRUCTURED

PROGRAMMING

The algorithms presented in this book are designed to be coded quickly and easily for

operation on a diverse collection of possible target machines in a variety of

programming languages. Originally, in preparing the first edition of the book, I

considered presenting programs in BASIC, but found at the time that the various

dialects of this language were not uniform in syntax. Since then, International

Standard Minimal BASIC (IS0 6373/ 1984) has been published, and most commonly

available BASICS will run Minimal BASIC without difficulty. The obstacle for the user is

that Minimal BASIC is too limited for most serious computing tasks, in that it lacks

string and file handling capabilities. Nevertheless, it is capable of demonstrating all

the algorithms in this book.

12

Compact numerical methods for computers

As this revision is being developed, efforts are ongoing to agree an international

standard for Full BASIC. Sadly, in my opinion, these efforts do not reflect the majority

of existing commercial and scientific applications. which are coded in a dialect of

BASIC compatible with language processors from Microsoft Corporation or Borland

International (Turbo BASIC).

Many programmers and users do not wish to use BASIC, however, for reasons quite

apart from capability. They may prefer FORTRAN, APL, C, Pascal, or some other

programming language. On certain machinery, users may be forced to use the

programming facilities provided. In the 1970s, most Hewlett-Packard desktop

computers used exotic programming languages of their own, and this has continued

to the present on programmable calculators such as the HP 15C. Computers offering

parallel computation usually employ programming languages with special extensions

to allow the extra functionality to be exploited.

As an author trying to serve this fragmented market, I have therefore wanted to

keep to my policy of presenting the algorithms in step-and-description form.

However, implementations of the algorithms allow their testing, and direct publication of a working code limits the possibilities for typographical errors. Therefore,

in this edition, the step-and-description codes have been replaced by Turbo Pascal

implementations. A coupon for the diskette of the codes is included. Turbo Pascal

has a few disadvantages, notably some differences from International Standard

Pascal, but one of its merits (others are discussed in $1.6) is that it allows the

algorithms to be presented in a manner which is readable and structured.

In recent years the concepts of structured and modular programming have

become very popular, to the extent that one programming language (Modula-2) is

founded on such principles. The interested reader is referred to Kernighan and

Plauger (1974) or Yourdon (1975) for background, and to Riley (1988) for a more

modern exposition of these ideas. In my own work, I have found such concepts

extremely useful, and I recommend them to any practitioner who wishes to keep his

debugging and reprogramming efforts to a minimum. Nevertheless, while modularity is relatively easy to impose at the level of individual tasks such as the decomposition of a matrix or the finding of the minimum of a function along a line, it is not

always reasonable to insist that the program avoid GOTO instructions. After all, in

aimimg to keep memory requirements as low as possible, any program code which

can do double duty is desirable. If possible, this should be collected into a

subprogram. In a number of cases this will not be feasible, since the code may have to

be entered at several points. Here the programmer has to make a judgement between

compactness and readability of his program. I have opted for the former goal when

such a decision has been necessary and have depended on comments and the essential

shortness of the code to prevent it from becoming incomprehensible.

The coding of the algorithms in the book is not as compact as it might be in a

specific application. In order to maintain a certain generality, I have chosen to allow

variables and parameters to be passed to procedures and functions from fairly

general driver programs. If algorithms are to be placed in-line in applications, it is

possible to remove some of this program ‘glue’. Furthermore, some features may not

always be necessary, for example, computation of eigenvectors in the Jacobi method

for eigensolutions of a real symmetric matrix (algorithm 14).

A starting point

13

It should also be noted that I have taken pains to make it easy to save a ‘copy’ of

the screen output to a file by duplicating all the output statements, that is the ‘write’

and ‘writeln’ commands, so that output is copied to a file which the user may name.

(These statements are on the disk files, but deleted from the listings to reduce space

and improve readability.) Input is allowed from an input file to allow examples to be

presented without the user needing to type the appropriate response other than the

name of the relevant ‘example’ file.

Furthermore, I have taken advantage of features within the MS-DOS operating

system, and supported by compiler directives in Turbo Pascal, which allow for

pipelining of input and output. This has allowed me to use batch files to automate the

running of tests.

In the driver programs I have tried to include tests of the results of calculations, for

example, the residuals in eigenvalue computations. In practice, I believe it is

worthwhile to perform these calculations. When memory is at a premium, they can

be performed ‘off-line’ in most cases. That is. the results can be saved to disk

(backing storage) and the tests computed as a separate task, with data brought in

from the disk only as needed.

These extra features use many extra bytes of code, but are, of course, easily

deleted. Indeed, for specific problems, 75% or more of the code can be removed.

1.5. CHOICE OF ALGORITHMS

The algorithms in this book have been chosen for their utility in solving a variety of

important problems in computational mathematics and their ease of implementation

to short programs using relatively little working storage. Many topics are left out,

despite their importance, because I feel that there has been insufficient development in

directions relevant to compact numerical methods to allow for a suitable algorithm

to be included. For example, over the last 15 years I have been interested in methods

for the mathematical programming problem which do not require a tableau to be

developed either explicitly or implicitly, since these techniques are generally quite

memory and code intensive. The appearance of the interior point methods associated

with the name of Karmarkar (1984) hold some hope for the future, but currently the

programs are quite complicated.

In the solution of linear equations, my exposition has been confined to Gauss

elimination and the Choleski decomposition. The literature on this subject is,

however, vast and other algorithms exist. These can and should be used if special

circumstances arise to make them more appropriate. For instance, Zambardino

(1974) presents a form of Gauss elimination which uses less space than the one

presented here. This procedure, in ALGOL, is called QUARTERSOLVE because only

n 2/4 elements are stored, though an integer vector is needed to store pivot

information and the program as given by Zambardino is quite complicated.

Many special methods can also be devised for matrices having special structures

such as diagonal bands. Wilkinson and Reinsch (1971) give several such algorithms for both linear equations and the eigenvalue problem. The programmer

with many problems of a special structure should consider these. However, I have

found that most users want a reliable general-purpose method for linear equations

14

Compact numerical methods for computers

because their day-to-day problems vary a great deal. I have deliberately avoided

including a great many algorithms in this volume because most users will likely be

their own implementors and not have a great deal of time to spend choosing,

coding, testing and, most of all, maintaining programs.

Another choice which has been made is that of only including algorithms which

are relatively ‘small’ in the sense that they fit into the machine all at once. For

instance, in the solution of the algebraic eigenvalue problem on large computers,

conventionally one reduces the matrix to a special form such as a tridiagonal or a

Hessenberg matrix, solves the eigenproblem of the simpler system then backtransforms the solutions. Each of the three phases of this procedure could be

fitted into a small machine. Again, for the practitioner with a lot of matrices to

solve or a special requirement for only partial solution, such methods should be

employed. For the one-at-a-time users, however, there is three times as much

program code to worry about.

The lack of double-precision arithmetic on the machines I used to develop the

algorithms which are included has no doubt modified my opinions concerning

algorithms. Certainly, any algorithm requiring inner products of vectors, that is

(1.10)

cannot be executed as accurately without extended-precision arithmetic (Wilkinson 1963). This has led to my abandonment of algorithms which seek to find the

minimum of a function along a line by use of gradient information. Such

algorithms require the derivative along the line and employ an inner product to

compute this derivative. While these methods are distinctly unreliable on a

machine having only a single, low-precision arithmetic, they can no doubt be used

very effectively on other machines.

From the above discussion it will be evident that the principles guiding

algorithm selection have been:

(i) shortness of program which results from implementation and low storage

requirement, and

(ii) general utility of the method and importance of the problem which it solves.

To these points should be added:

(iii) proven reliability in a number of tests

(iv) the ease and speed with which a user can obtain useful results from the

algorithms.

The third point is very important. No program should be considered acceptable until

it has been tested fairly extensively. If possible, any method which gives solutions

that can be checked by computing diagnostics should compute such information

routinely. For instance, I have had users of my eigenvalue/eigenvector programs call

me to say, ‘Your program doesn’t work!’ In all cases to date they have been

premature in their judgement, since the residuals computed as a routine adjunct to

the eigensolution formation have shown the output to be reasonable even though it

might be very different from what the user expected. Furthermore, I have saved

A starting point

15

myself the effort of having to duplicate their calculation to prove the correctness of

the results. Therefore, if at all possible, such checks are always built into my

programs.

The fourth point is important if users are to be able to try out the ideas presented

in this book. As a user myself, I do not wish to spend many hours mastering the

details of a code. The programs are to be treated as tools, not an end in themselves.

These principles lead to the choice of the Givens’ reduction in algorithm 4 as a

method for the solution of least-squares problems where the amount of data is too

great to allow all of it to be stored in the memory at once. Similarly, algorithms 24

and 25 require the user to provide a rule for the calculation of the product of a

matrix and a vector as a step in the solution of linear equations or the algebraic

eigenproblem. However, the matrix itself need not be stored explicitly. This

avoids the problem of designing a special method to take advantage of one type of

matrix, though it requires rather more initiative from the user as it preserves this

measure of generality.

In designing the particular forms of the algorithms which appear, a conscious

effort has been made to avoid the requirement for many tolerances. Some

machine-dependent quantities are unfortunately needed (they can in some cases

be calculated by the program but this does lengthen the code), but as far as

possible, and especially in determining when iterative algorithms have converged,

devices are used which attempt to ensure that as many digits are determined as

the machine is able to store. This may lead to programs continuing to execute long

after acceptable answers have been obtained. However, I prefer to sin on the side

of excess rather than leave results wanting in digits. Typically, the convergence

test requires that the last and present iterates be identical to the precision of the

machine by means of a test such as

if x + delta + offset = x + offset then halt;

where offset is some modest number such as 10. On machines which have an

accumulator with extra digits, this type of test may never be satisfied, and must be

replaced by

y: = x + delta + offset;

z: = x + offset;

if y = z then halt;

The ‘tolerance’ in this form of test is provided by the offset: within the computer the

representations of y and z must be equal to halt execution. The simplicity of this type

of test usually saves code though, as mentioned, at the possible expense of execution

time.

1.6. A METHOD FOR EXPRESSING ALGORITHMS

In the first edition of this work, algorithms were expressed in step-and-description

form. This allowed users to program them in a variety of programming languages.

Indeed, one of the strengths of the first edition was the variety of implementations.

At the time it appeared, a number of computers had their own languages or dialects,

16

Compact numerical methods for computer

and specialisation to one programming language would have inhibited users of these

special machines. Now, however, computer users are unlikely to be willing to type in

code if a machine-readable form of an algorithm exists. Even if the programming

language must be translated. having a workable form is useful as a starting point.

The original codes for the first edition were in BASIC for a Data General NOVA.

Later these codes were made to run on a North Star Horizon. Some were made to

work on a Hewlett-Packard 9830A. Present BASIC versions run on various common

microcomputers under the Microsoft BASIC dialect; however, since I have used very

conservative coding practices, apart from occasional syntactic deviations, they

conform to IS0 Minimal BASIC (IS0 6373-1984).

Rather than proof-reading the algorithms for the first edition, I re-coded them in

FORTRAN. These codes exist as NASHLIB, and were and are commercially available

from the author. I have not, however, been particularly satisfied that the FORTRAN

implementation shows the methods to advantage, since the structure of the algorithms seems to get lost in the detail of FORTRAN code. Also, the working parts of the

codes are overshadowed by the variable definitions and subroutine calls. Compact

methods are often best placed in-line rather than called as standard subprograms as I

have already indicated.

In the current edition, I want to stay close to the original step-and-description

form of the algorithms, but nevertheless wish to offer working codes which could be

distributed in machine-readable form. I have chosen to present the algorithms in

Borland Turbo Pascal. This has the following justification.

(i) Pascal allows comments to be placed anywhere in the code, so that the

original style for the algorithms, except for the typographic conventions, could be

kept.

(ii) Turbo Pascal is available for many machines and is relatively inexpensive. It

is used as a teaching tool at many universities and colleges, including the University

of Ottawa. Version 3.01a of the Turbo Pascal system was used in developing the

codes which appear here. I intend to prepare versions of the codes to run under later

versions of this programming environment.

(iii) The differences between Turbo and Standard Pascal are unlikely to be

important for the methods, so that users of other Pascal compilers can also use these

codes.

(iv) Pascal is ‘close enough’ to many other programming languages to allow for

straightforward translation of the codes.

A particular disadvantage of Turbo Pascal for my development work is that I have

yet to find a convenient mechanism allowing automatic compilation and execution of

codes, which would permit me to check a complete set of code via batch execution.

From the perspective of the physical length of the listings, the present algorithms are

also longer than I would like because Pascal requires program headings and

declarations. In the procedural parts of the codes, ‘begin’ and ‘end’ statements also

add to the line count to some extent.

From the user perspective, the requirement that matrix sizes be explicitly specified

can be a nuisance. For problems with varying matrix sizes it may be necessary to

compile separate versions of programs.

A starting point

17

Section 1.8 notes some other details of algorithm expression which relate to the

ease of use of the codes.

1.7. GENERAL NOTATION

I have not attempted to avoid re-use of symbols within this work since this would

have required an over-large set of symbols. In fact, I have used greek letters as

little as possible to save my typists’ and typesetters’ effort. However, within

chapters and within a subject area the symbols should be consistent. There follow

some brief general points on notation.

(i) Absolute value is denoted by vertical bars about a quantity, | |.

(ii) The norm of a quantity is denoted by double vertical bars, || ||. The form of

this must be found, where necessary, from the context.

(iii) A closed interval [u, v] comprises all points x such that u < x < v. An open

interval (u, v) comprises all points x such that u < x < v.

(iv) The exponents of decimal numbers will be expressed using the symbol E as in

1·234 * 10-5 = 1·234E-5

and

6·78 * 102 = 678 = 6·78E2.

This notation has already appeared in §1.2.

1.8. SOFTWARE ENGINEERING ISSUES

The development of microcomputer software for users who are not trained in

computer science or related subjects has given rise to user interfaces which are much

more sophisticated and easy to use than were common when the first edition

appeared. Mathematical software has not escaped such developments, but source

code collections have generally required the user to consolidate program units, add

driver and user routines, and compile and run the programs. In my experience, the

lack of convenience implied by this requirement is a major obstacle to users learning

about software published in source code form. In our nonlinear estimation software

(Nash and Walker-Smith 1987), we were careful to include batch command files to

allow for easy consolidation and execution of programs. This philosophy is continued here, albeit adapted to Turbo Pascal.

1. All driver programs include code (from the fragment in file startup.pas) to

allow the user to specify a file from which the program control input and the

problem data are to be input. We refer to this as a ‘control input file’. It has a

name stored in the global string variable infname, and is referred to by the

global text variable infile. Algorithms which need input get it by read or readln

statements from infile. The input file can be ‘con’, the console keyboard.

WARNING: errors in input control files may cause source code files to be destroyed. I

believe this is a ‘bug’ in Turbo Pascal 3.01a, the version used to develop the codes.

18

Compact numerical methods for computers

The use of an include file which is not a complete procedure or function is not

permitted by Turbo Pascal 5.0.

2. The same program code (startup.pas) allows an output file to be specified so

that all output which appears on the console screen is copied to a file. The name

for this file is stored in the global variable confname, and the file is referred to in

programs by the global text variable confile. Output is saved by the crude but

effective means of duplicating every write(. . .) and writeln(. . .) statement with

equivalent write(confile, . . .) and writeln(confile, . . .) statements.

3. To make the algorithms less cluttered, these write and writeln statements to

confile do not appear in the listings. They are present in the files on diskette.

4. To discourage unauthorised copying of the diskette files, all commentary and

documentation of the algorithm codes has been deleted.

5. To allow for execution of the programs from operating system commands (at

least in MS-DOS), compiler directives have been included at the start of all

driver programs. Thus, if a compiled form of code dr0102.pas exists as

dr0102.com, and a file dr0102x contains text equivalent to the keyboard input

needed to correctly run this program, the command

dr0102 < dr0102x

will execute the program for the given data.

Previous Home Next

Chapter 2

FORMAL PROBLEMS IN LINEAR ALGEBRA

2.1. INTRODUCTION

A great many practical problems in the scientific and engineering world give rise

to models or descriptions of reality which involve matrices. In consequence, a very

large proportion of the literature of numerical mathematics is devoted to the

solution of various matrix equations. In the following sections, the major formal

problems in numerical linear algebra will be introduced. Some examples are

included to show how these problems may arise directly in practice. However, the

formal problems will in most cases occur as steps in larger, more difficult

computations. In fact, the algorithms of numerical linear algebra are the keystones of numerical methods for solving real problems.

Matrix computations have become a large area for mathematical and computational research. Textbooks on this subject, such as Stewart (1973) and Strang

(1976), offer a foundation useful for understanding the uses and manipulations of

matrices and vectors. More advanced works detail the theorems and algorithms for

particular situations. An important collection of well-referenced material is Golub

and Van Loan (1983). Kahaner, Moler and Nash (1989) contains a very readable

treatment of numerical linear algebra.

2.2. SIMULTANEOUS LINEAR EQUATIONS

If there are n known relationships

Ailx1 + Ai2x2 +. . .+ Ainxn = bi

i = 1, 2, . . . , n

(2.1)

between the n quantities xj with the coefficients Aij and right-hand side elements

bi, i = 1, 2, . . . , n, then (2.1) is a set of n simultaneous linear equations in n

unknowns xj, j = 1, 2, . . . , n. It is simpler to write this problem in matrix form

Ax = b

(2.2)

where the coefficients have been collected into the matrix A, the right-hand side is

now the vector b and the unknowns have been collected as the vector x. A further

way to write the problem is to collect each column of A (say the jth) into a column

vector (i.e. aj). Then we obtain

(2.3)

Numerous textbooks on linear algebra, for instance Mostow and Sampson

(1969) or Finkbeiner (1966), will provide suitable reading for anyone wishing to

19

20

Compact numerical methods for computers

learn theorems and proofs concerning the existence of solutions to this problem.

For the purposes of this monograph, it will suffice to outline a few basic properties

of matrices as and when required.

Consider a set of n vectors of length n, that is

a1, a2, . . . , an.

(2.4)

These vectors are linearly independent if there exists no set of parameters

xj, j = 1, 2, . . . , n (not all zero), such that

(2.5)

where 0 is the null vector having all components zero. If the vectors aj are now

assembled to make the matrix A and are linearly independent, then it is always

possible to find an x such that (2.2) is satisfied. Other ways of stating the

condition that the columns of A are linearly independent are that A has full rank

or

rank(A) = n

(2.6)

or that A is non-singular,

If only k < n of the vectors are linearly independent, then

rank(A) = k

(2.7)

and A is singular. In general (2.2) cannot be solved if A is singular, though

consistent systems of equations exist where b belongs to the space spanned by

{aj: j = 1, 2, . . . , n}.

In practice, it is useful to separate linear-equation problems into two categories.

(The same classification will, in fact, apply to all problems involving matrices.)

(i) The matrix A is of modest order with probably few zero elements (dense).

(ii) The matrix A is of high order and has most of its elements zero (sparse).

The methods presented in this monograph for large matrices do not specifically

require sparsity. The question which must be answered when computing on a small

machine is, ‘Does the matrix fit in the memory available?’

Example 2.1. Mass - spectrograph calibration

To illustrate a use for the solution of a system of linear equations, consider the

determination of the composition of a mixture of four hydrocarbons using a mass

spectrograph. Four lines will be needed in the spectrum. At these lines the

intensity for the sample will be bi, i = 1, 2, 3, 4. To calibrate the instrument,

intensities Aij for the ith line using a pure sample of the j th hydrocarbon are

measured. Assuming additive line intensities, the composition of the mixture is

then given by the solution x of

Ax = b.

Example 2.2. Ordinary differential equations: a two-point boundary-value problem

Large sparse sets of linear equations arise in the numerical solution of differential

Formal problems in linear algebra

21

equations. Fr?berg (1965, p 256) considers the differential equation

y" + y/(1+x 2 ) = 7x

(2.8)

with the boundary conditions

y= 0

2

{

for x = 0

for x = 1.

(2.9)

(2.10)

To solve this problem numerically, Fr?berg replaces the continuum in x on the

interval [0, 1] with a set of (n + 1) points, that is, the step size on the grid is

h = 1/n. The second derivative is therefore replaced by the second difference at

point j

(2.11)

(yj+l – 2yj + yj-1)/h 2 .

The differential equation (2.8) is therefore approximated by a set of linear

equations of which the jth is

(2.12)

or

(2.13)

Because y0 = 0 and yn = 2, this set of simultaneous linear equations is of order

(n - 1). However, each row involves at most three of the values yj. Thus, if the

order of the set of equations is large, the matrix of coefficients is sparse.

2.3. THE LINEAR LEAST-SQUARES PROBLEM

As described above, n linear equations give relationships which permit n parameters to be determined if the equations give rise to linearly independent coefficient

vectors. If there are more than n conditions, say m, then all of them may not

necessarily be satisfied at once by any set of parameters x. By asking a somewhat

different question, however, it is possible to determine solutions x which in some

way approximately satisfy the conditions. That is, we wish to write

Ax b

(2.14)

where the sense of the sign is yet to be defined.

By defining the residual vector

r = b – Ax

(2.15)

we can express the lack of approximation for a given x by the norm of r

|| r ||.

This must fulfil the following conditions:

for r

(2.16)

|| r || > 0

(2.17)

|| cr || = || c || · || r ||

(2.18)

0, and || 0 || = 0,

22

Compact numerical methods for computers

for an arbitrary complex number c, and

(2.19)

|| r + s || < || r || + || s ||

where s is a vector of the same order as r (that is, m).

Condition (2.19) is called the triangle inequality since the lengths of the sides of

a triangle satisfy this relationship. While there exist many norms, only a few are of

widespread utility, and by and large in this work only the Euclidean norm

|| r ||E = (r T r) ?

(2.20)

will be used. The superscript T denotes transposition, so the norm is a scalar. The

square of the Euclidean norm of r

(2.21)

is appropriately called the sum of squares. The least-squares solution x of (2.14) is

that set of parameters which minimises this sum of squares. In cases where

rank(A) < n this solution is not unique. However, further conditions may be

imposed upon the solution to ensure uniqueness. For instance. it may be required

that in the non-unique case, x shall be that member of the set of vectors which

minimises rT r which has x T x a minimum also. In this case x is the unique

minimum-length least-squares solution.

If the minimisation of r T r with respect to x is attempted directly, then using

(2.15) and elementary calculus gives

AT Ax = A T b

(2.22)

as the set of conditions which x must satisfy. These are simply n simultaneous

linear equations in n unknowns x and are called the normal equations. Solution of

the least-squares problem via the normal equations is the most common method

by which such problems are solved. Unfortunately, there are several objections to

such an approach if it is not carefully executed, since the special structure of ATA

and the numerical instabilities which attend its formation are ignored at the peril

of meaningless computed values for the parameters x.

Firstly, any matrix B such that

x T Bx > 0

for all x

(2.23)

0 is called positive definite. If

x T Bx > 0

(2.24)

for all x, B is non-negative definite or positive semidefinite. The last two terms are

synonymous. The matrix AT A gives the quadratic form

Q = xTAT Ax

(2.25)

for any vector x of order n. By setting

y = Ax

Q = yT y > 0

(2.26)

(2.27)

Formal problems in linear algebra

23

T

so that A A is non-negative definite. In fact, if the columns of A are linearly

independent, it is not possible for y to equal the order-m null vector 0, so that in

this case AT A is positive definite. This is also called the full-rank case.

Secondly, many computer programs for solving the linear least-squares problem

ignore the existence of special algorithms for the solution of linear equations

having a symmetric, positive definite coefficient matrix. Above it has already been

established that AT A is positive definite and symmetry is proved trivially. The

special algorithms have advantages in efficiency and reliability over the methods

for the general linear-equation problem.

Thirdly, in chapter 5 it will be shown that the formation of AT A can lead to loss

of information. Techniques exist for the solution of the least-squares problem

without recourse to the normal equations. When there is any question as to the

true linear independence of the columns of A, these have the advantage that they

permit the minimum-length least-squares solution to be computed.

It is worth noting that the linear-equation problem of equation (2.2) can be

solved by treating it as a least-squares problem. Then for singular matrices A

there is still a least-squares solution x which, if the system of equations is

consistent, has a zero sum of squares r T r. For small-computer users who do not

regularly need solutions to linear equations or whose equations have coefficient

matrices which are near-singular (ill conditioned is another way to say this), it is

my opinion that a least-squares solution method which avoids the formation of

AT A is useful as a general approach to the problems in both equations (2.2) and

(2.14).

As for linear equations, linear least-squares problems are categorised by

whether or not they can be stored in the main memory of the computing device at

hand. Once again, the traditional terms dense and sparse will be used, though

some problems having m large and n reasonably small will have very few zero

entries in the matrix A.

Example 2.3. Least squares

It is believed that in the United States there exists a linear relationship between

farm money income and the agricultural use of nitrogen, phosphate, potash and

petroleum. A model is therefore formulated using, for simplicity, a linear form

(money income) = x1 + x2 (nitrogen) + x3 (phosphate) + x4 (potash) + x5 (petroleum).

(2.28)

For this problem the data are supplied as index numbers (1940 = 100) to avoid

difficulties associated with the units in which the variables are measured. By

collecting the values for the dependent variable (money income) as a vector b and

the values for the other variables as the columns of a matrix A including the

constant unity which multiplies x1, a problem

Ax b

(2.14)

is obtained. The data and solutions for this problem are given as table 3.1 and

example 3.2.

24

Compact numerical methods for computers

Example 2.4. Surveying-data fitting

Consider the measurement of height differences by levelling (reading heights off a

vertical pole using a levelled telescope). This enables the difference between the

heights of two points i and j to be measured as

bij, = hi– hj + rij

(2.29)

where rij is the error made in taking the measurement. If m height differences are

measured, the best set of heights h is obtained as the solution to the least-squares

problem

minimise rT r

(2.30)

where

r = b – Ah

and each row of A has only two non-zero elements, 1 and -1, corresponding to

the indices of the two points involved in the height-difference measurement. Sometimes the error is defined as the weighted residual

rij = [bij – (hi – hj)]d ij

where dij is the horizontal distance between the two points (that is, the measurement error increases with distance).

A special feature of this particular problem is that the solution is only

determined to within a constant, h0, because no origin for the height scale has

been specified. In many instances, only relative heights are important, as in a

study of subsidence of land. Nevertheless, the matrix A is rank-deficient, so any

method chosen to solve the problem as it has been presented should be capable of

finding a least-squares solution despite the singularity (see example 19.2).

2.4. THE INVERSE AND GENERALISED INVERSE OF A MATRIX

An important concept is that of the inverse of a square matrix A. It is defined as

that square matrix, labelled A-1, such that

A-1A = AA-1 = 1 n

(2.31)

where 1n is the unit matrix of order n. The inverse exists only if A has full rank.

Algorithms exist which compute the inverse of a matrix explicitly, but these are of

value only if the matrix inverse itself is useful. These algorithms should not be

used, for instance, to solve equation (2.2) by means of the formal expression

x = A- 1 b

(2.32)

since this is inefficient. Furthermore, the inverse of a matrix A can be computed

by setting the right-hand side b in equation (2.2) to the n successive columns of

the unit matrix 1 n. Nonetheless, for positive definite symmetric matrices, this

monograph presents a very compact algorithm for the inverse in §8.2.

When A is rectangular, the concept of an inverse must be generalised. Corresponding to (2.32) consider solving equation (2.14) by means of a matrix A+, yet to

be defined, such that

(2.33)

x = A+ b.

Formal problems in linear algebra

25

In other words, we have

A + A = 1n .

(2.34)

When A has only k linearly independent columns, it will be satisfactory if

A+ A =

(2.35)

but in this case x is not defined uniquely since it can contain arbitrary components

from the orthogonal complement of the space spanned by the columns of A. That

is, we have

(2.36)

x = A+ b + (1 n – A + A) g

where g is any vector of order n.

The normal equations (2.22) must still be satisfied. Thus in the full-rank case, it

is straightforward to identify

A+ = (AT A) -lA T .

(2.37)

In the rank-deficient case, the normal equations (2.22) imply by substitution of

(2.36) that

(2.38)

AT A x = AT AA+ b+(AT A – AT AA+ A) g

= AT b .

If

AT AA+ = AT

(2.39)

then equation (2.38) is obviously true. By requiring A+ to satisfy

AA + A = A

(2.40)

(AA+)T = AA+

(2.41)

and

this can indeed be made to happen. The proposed solution (2.36) is therefore a

least-squares solution under the conditions (2.40) and (2.41) on A+. In order that

(2.36) gives the minimum-length least-squares solution, it is necessary that xT x be

minimal also. But from equation (2.36) we find

x T x = b T (A+ ) T A+ b + g T (1 – A+A) T ( 1 – A+A)g + 2g T(1 – A+ A) TA+ b (2.42)

which can be seen to have a minimum at

g =0

if

(1 – A + A) T

(2.43)

26

Compact numerical methods for computers

is the annihilator of A+ b, thus ensuring that the two contributions (that is, from b

and g) to x T x are orthogonal. This requirement imposes on A + the further

conditions

A+ AA + = A+

(2.44)

(A+ A)T = A+ A.

(2.45)

The four conditions (2.40), (2.41), (2.44) and (2.45) were proposed by Penrose

(1955). The conditions are not, however, the route by which A + is computed.

Here attention has been focused on one generalised inverse, called the MoorePenrose inverse. It is possible to relax some of the four conditions and arrive at

other types of generalised inverse. However, these will require other conditions to

be applied if they are to be specified uniquely. For instance, it is possible to

consider any matrix which satisfies (2.40) and (2.41) as a generalised inverse of A

since it provides, via (2.33), a least-squares solution to equation (2.14). However,

in the rank-deficient case, (2.36) allows arbitrary components from the null space

of A to be added to this least-squares solution, so that the two-condition generalised inverse is specified incompletely.

Over the years a number of methods have been suggested to calculate ‘generalised

inverses’. Having encountered some examples of dubious design, coding or applications of such methods, I strongly recommend testing computed generalised inverse

matrices to ascertain the extent to which conditions (2.40), (2.41), (2.44) and (2.45)

are satisfied (Nash and Wang 1986).

2.5. DECOMPOSITIONS OF A MATRIX

In order to carry out computations with matrices, it is common to decompose

them in some way to simplify and speed up the calculations. For a real m by n

matrix A, the QR decomposition is particularly useful. This equates the matrix A

with the product of an orthogonal matrix Q and a right- or upper-triangular

matrix R, that is

A = QR

(2.46)

where Q is m by m and

QT Q = QQT = 1 m

(2.47)

and R is m by n with all elements

Rij = 0

for i > j.

(2.48)

The QR decomposition leads to the singular-value decomposition of the matrix A

if the matrix R is identified with the product of a diagonal matrix S and an orthogonal matrix VT, that is

R = SVT

(2.49)

where the m by n matrix S is such that

j

(2.50)

V T V = VVT = 1 n .

(2.5 1)

Sij = 0

for i

and V, n by n, is such that

Formal problems in linear algebra

27

Note that the zeros below the diagonal in both R and S imply that, apart from

orthogonality conditions imposed by (2.47), the elements of columns (n + 1),

(n + 2), . . . , m of Q are arbitrary. In fact, they are not needed in most calculations, so will be dropped, leaving the m by n matrix U, where

UT U = 1n .

(2.52)

T

Note that it is no longer possible to make any statement regarding UU . Omitting

rows (n + 1) to m of both R and S allows the decompositions to be written as

A = UR = USVT

(2.53)

where A is m by n, U is m by n and U T U = 1n, R is n by n and upper-triangular, S

is n by n and diagonal, and V is n by n and orthogonal. In the singular-value

decomposition the diagonal elements of S are chosen to be non-negative.

Both the QR and singular-value decompositions can also be applied to square

matrices. In addition, an n by n matrix A can be decomposed into a product of

a lower- and an upper-triangular matrix, thus

A = LR.

(2.54)

In the literature this is also known as the LU decomposition from the use of ‘U’ for

‘upper triangular’. Here another mnemonic, ‘U’ for ‘unitary’ has been employed.

If the matrix A is symmetric and positive definite, the decomposition

A = LLT

(2.55)

is possible and is referred to as the Choleski decomposition.

A scaled form of this decomposition with unit diagonal elements for L can be written

A = LDL T

where D is a diagonal matrix.

To underline the importance of decompositions, it can be shown by direct

substitution that if

A = USVT

(2.53)

then the matrix

(2.56)

A+ = VS+ U T

where

for Sii 0

S = 1/S ii

(2.57)

0

for Sii = 0

{

satisfies the four conditions (2.40), (2.41), (2.44) and (2.45), that is

AA + A = USVT VS+ UT USVT

= USS+ SVT

= USVT = A

(2.58)

(2.59)

28

Compact numerical methods for computers

A+ AA+ = VS+ UT USVT VS+ UT

= VS+ SS+ UT = VS+ U T = A+

(2.60)

and

T T

(A+ A)T = (VS + UT USVT)T = ( VS+ SV )

= VS + SVT = A+ A.

Several of the above relationships depend on the diagonal nature of S and S

on the fact that diagonal matrices commute under multiplication.

(2.61)

+

and

2.6. THE MATRIX EIGENVALUE PROBLEM

An eigenvalue e and eigenvector x of an n by n matrix A, real or complex, are

respectively a scalar and vector which together satisfy the equation

Ax = ex.

(2.62)

There will be up to n eigensolutions (e, x) for any matrix (Wilkinson 1965) and

finding them for various types of matrices has given rise to a rich literature. In

many cases, solutions to the generalised eigenproblem

Ax = eBx

(2.63)

are wanted, where B is another n by n matrix. For matrices which are of a size

that the computer can accommodate, it is usual to transform (2.63) into type

(2.62) if this is possible. For large matrices, an attempt is usually made to solve

(2.63) itself for one or more eigensolutions. In all the cases where the author has

encountered equation (2.63) with large matrices, A and B have fortunately been

symmetric, which provides several convenient simplifications, both theoretical and

computational.

Example 2.5. Illustration of the matrix eigenvalue problem

In quantum mechanics, the use of the variation method to determine approximate

energy states of physical systems gives rise to matrix eigenvalue problems if the

trial functions used are linear combinations of some basis functions (see, for

instance, Pauling and Wilson 1935, p 180ff).

If the trial function is F, and the energy of the physical system in question is

described by the Hamiltonian operator H, then the variation principle seeks

stationary values of the energy functional

(F, HF)

C = (E, F)

(2.64)

subject to the normalisation condition

(F, F) = 1

(2.65)

where the symbol ( , ) represents an inner product between the elements

separated by the comma within the parentheses. This is usually an integral over all

Formal problems in linear algebra

29

the dimensions of the system. If a linear combination of some functions fi,

j = 1, 2, . . . ) n, is used for F, that is

(2.66)

then the variation method gives rise to the eigenvalue problem

Ax = eBx

(2.63)

Aij = (fj, Hfi)

(2.67)

Bij = (fi, fj)

(2.68)

with

and

It is obvious that if B is a unit matrix, that is, if

1

for i = j

(fi, fi) = ? ij = 0

for i j

{

(2.69)

a problem of type (2.56) arises. A specific example of such a problem is equation

(11.1).

Previous

Chapter 3

THE SINGULAR-VALUE DECOMPOSITION AND

ITS USE TO SOLVE LEAST-SQUARES PROBLEMS

3.1. INTRODUCTION

This chapter presents an algorithm for accomplishing the powerful and versatile

singular-value decomposition. This allows the solution of a number of problems to

be realised in a way which permits instabilities to be identified at the same time.

This is a general strategy I like to incorporate into my programs as much as

possible since I find succinct diagnostic information invaluable when users raise

questions about computed answers-users do not in general raise too many idle

questions! They may, however, expect the computer and my programs to produce

reliable results from very suspect data, and the information these programs

generate together with a solution can often give an idea of how trustworthy are

the results. This is why the singular values are useful. In particular, the appearance of singular values differing greatly in magnitude implies that our data are

nearly collinear. Collinearity introduces numerical problems simply because small

changes in the data give large changes in the results. For example, consider the

following two-dimensional vectors:

A = (1, 0)T

B = (1, 0·1)T

C = (0·95, 0·1)T.

Vector C is very close to vector B, and both form an angle of approximately 6°

with vector A. However, while the angle between the vector sums (A + B) and

(A + C) is only about 0.07°, the angle between (B – A) and (C – A) is greater

than 26°. On the other hand, the set of vectors

A = (1, 0)T

D = (0, 1)T

E = (0, 0·95)T

gives angles between (A + D) and (A + E) and between (D – A) and (E – A) of

approximately 1·5°. In summary, the sum of collinear vectors is well determined,

the difference is not. Both the sum and difference of vectors which are not

collinear are well determined.

30

Home

Next

Singular-value decomposition, and use in least-squares problems

31

3.2. A SINGULAR-VALUE DECOMPOSITION ALGORITHM

It may seem odd that the first algorithm to be described in this work is designed to

compute the singular-value decomposition (svd) of a matrix. Such computations are

topics well to the back of most books on numerical linear algebra. However, it was

the algorithm below which first interested the author in the capabilities of small

computers. Moreover, while the svd is somewhat of a sledgehammer method for

many nutshell problems, its versatility in finding the eigensolutions of a real

symmetric matrix, in solving sets of simultaneous linear equations or in computing

minimum-length solutions to least-squares problems makes it a valuable building

block in programs used to tackle a variety of real problems.

This versatility has been exploited in a single small program suite of approximately

300 lines of BASIC code to carry out the above problems as well as to find inverses

and generalised inverses of matrices and to solve nonlinear least-squares problems

(Nash 1984b, 1985).

The mathematical problem of the svd has already been stated in §2.5. However,

for computational purposes, an alternative viewpoint is more useful. This considers the possibility of finding an orthogonal matrix V, n by n, which transforms the

real m by n matrix A into another real m by n matrix B whose columns are

orthogonal. That is, it is desired to find V such that

B = AV = (bl, b2, . . . , bn)

(3.1)

where

(3.2)

and

VVT = VT V = 1n .

(3.3)

The Kronecker delta takes values

? ij =

{ 10

for i j

for i = j.

(3.4)

The quantities Si may, as yet, be either positive or negative, since only their

square is defined by equation (3.2). They will henceforth be taken arbitrarily as

positive and will be called singular values of the matrix A. The vectors

uj = bj/Sj

(3.5)

which can be computed when none of the Sj is zero, are unit orthogonal vectors.

Collecting these vectors into a real m by n matrix, and the singular values into a

diagonal n by n matrix, it is possible to write

B = US

(3.6)

UT U = 1 n

(3.7)

where

is a unit matrix of order n.

In the case that some of the Sj are zero, equations (3.1) and (3.2) are still valid,

but the columns of U corresponding to zero singular values must now be

32

Compact numerical methods for computers

constructed such that they are orthogonal to the columns of U computed via

equation (3.5) and to each other. Thus equations (3.6) and (3.7) are also satisfied.

An alternative approach is to set the columns of U corresponding to zero singular

values to null vectors. By choosing the first k of the singular values to be the

non-zero ones, which is always possible by simple permutations within the matrix

V, this causes the matrix UT U to be a unit matrix of order k augmented to order n

with zeros. This will be written

(3.8)

While not part of the commonly used definition of the svd, it is useful to require

the singular values to be sorted, so that

S11 > S22 > S33 > . . . > Skk > . . . > Snn.

This allows (2.53) to be recast as a summation

(2.53a)

Partial sums of this series give a sequence of approximations

? 1, ?2, . . . , ?n .

where, obviously, the last member of the sequence

?n = A

since it corresponds to a complete reconstruction of the svd. The rank-one matrices

u j S jj v T j

can be referred to as singular planes, and the partial sums (in order of decreasing

singular values) are partial svds (Nash and Shlien 1987).

A combination of (3.1) and (3.6) gives

AV = US

(3.9)

or, using (3.3), the orthogonality of V,

A = USVT

(2.53)

which expresses the svd of A.

The preceding discussion is conditional on the existence and computability of a

suitable matrix V. The next section shows how this task may be accomplished.

3.3. ORTHOGONALISATION BY PLANE ROTATIONS

The matrix V sought to accomplish the orthogonalisation (3.1) will be built up as

Singular-value decomposition, and use in least-squares problems

33

a product of simpler matrices

(3.10)

where z is some index not necessarily related to the dimensions m and n of A, the

matrix being decomposed. The matrices used in this product will be plane

rotations. If V (k) is a rotation of angle ? in the ij plane, then all elements of V( k )

will be the same as those in a unit matrix of order n except for

(3.11)

Thus V (k) affects only two columns of any matrix it multiplies from the right.

These columns will be labelled x and y. Consider the effect of a single rotation

involving these two columns

(3.12)

Thus we have

X = x cos ? + y sin ?

Y = –x sin ? + y cos ?.

(3.13)

If the resulting vectors X and Y are to be orthogonal, then

XT Y = 0 = –(x T x – yT y) sin? cos? + x T y(cos2 ? – sin 2? ).

(3.14)

There is a variety of choices for the angle ?, or more correctly for the sine and

cosine of this angle, which satisfy (3.14). Some of these are mentioned by

Hestenes (1958), Chartres (1962) and Nash (1975). However, it is convenient if

the rotation can order the columns of the orthogonalised matrix B by length, so

that the singular values are in decreasing order of size and those which are zero

(or infinitesimal) are found in the lower right-hand corner of the matrix S as in

equation (3.8). Therefore, a further condition on the rotation is that

XT X – xT x > 0.

(3.15)

For convenience, the columns of the product matrix

(3.16)

will be donated ai, i = 1, 2, . . . , n. The progress of the orthogonalisation is then

observable if a measure Z of the non-orthogonality is defined

(3.17)

Since two columns orthogonalised in one rotation may be made non-orthogonal in

subsequent rotations, it is essential that this measure be reduced at each rotation.

34

Compact numerical methods for computers

Because only two columns are involved in the kth rotation, we have

Z(k) = Z(k-1) + (XT Y)2 – (x T y) 2 .

(3.18)

But condition (3.14) implies

Z (k) = Z (k-1) – (x T y) 2

(3.19)

so that the non-orthogonality is reduced at each rotation.

The specific formulae for the sine and cosine of the angle of rotation are (see

e.g. Nash 1975) given in terms of the quantities

and

p = xT y

q = xT x – y T y

(3.20)

(3.21)

v = (4p2 + q2) ? .

(3.22)

They are

cos ? = [(v + q)/(2v ) ] ?

sin ? = p/(v cos ? )

for q > 0

sin ? = sgn(p)[(v – q)/(2v ) ]

cos ? = p/(? sin ? )

?

for q < 0

(3.23)

(3.24)

(3.25)

(3.26)

where

}

1

sgn (p) = –1

for p > 0

for p < 0.

(3.27)

Note that having two forms for the calculation of the functions of the angle of

rotation permits the subtraction of nearly equal numbers to be avoided. As the

matrix nears orthogonality p will become small, so that q and v are bound to have

nearly equal magnitudes.

In the first edition of this book, I chose to perform the computed rotation only

when q > r, and to use

sin ( ? ) = 1

cos ( ? ) = 0

(3.28)

when q < 0. This effects an interchange of the columns of the current matrix A.

However, I now believe that it is more efficient to perform the rotations as defined in

the code presented. The rotations (3.28) were used to force nearly null columns of the

final working matrix to the right-hand side of the storage array. This will occur when

the original matrix A suffers from linear dependencies between the columns (that is,

is rank deficient). In such cases, the rightmost columns of the working matrix

eventually reflect the lack of information in the data in directions corresponding to

the null space of the matrix A. The current methods cannot do much about this lack

of information, and it is not sensible to continue computations on these columns. In

the current implementation of the method (Nash and Shlien 1987), we prefer to

ignore columns at the right of the working matrix which become smaller than a

Singular-value decomposition, and use in least-squares problems

35

specified tolerance. This has a side effect of speeding the calculations significantly

when rank deficient matrices are encountered.

3.4. A FINE POINT

Equations (3.15) and (3.19) cause the algorithm just described obviously to

proceed towards both an orthogonalisation and an ordering of the columns of the

resulting matrix A(z). However the rotations must be arranged in some sequence

to carry this task to completion. Furthermore, it remains to be shown that some

sequences of rotations will not place the columns in disorder again. For suppose

a1 is orthogonal to all other columns and larger than any of them individually. A

sequential arrangement of the rotations to operate first on columns (1, 2), then

(1, 3), (1, 4), . . . , (1, n), followed by (2, 3), . . . , (2, n), (3, 4), . . . , ((n – 1), n) will

be called a cycle or sweep. Such a sweep applied to the matrix described can easily

yield a new a2 for which

(3.29)

if, for instance, the original matrix has a2 = a3 and the norm of these vectors is

greater than 2-? times the norm of a1. Another sweep of rotations will put

things right in this case by exchanging a1 and a2. However, once two columns

have achieved a separation related in a certain way to the non-orthogonality

measure (3.17), it can be shown that no subsequent rotation can exchange them.

Suppose that the algorithm has proceeded so far that the non-orthogonality

measure Z satisfies the inequality

Z < t2

(3.30)

where t is some positive tolerance. Then, for any subsequent rotation the

parameter p, equation (3.21), must obey

p2 < t2.

(3.31)

Suppose that all adjacent columns are separated in size so that

(3.32)

Then a rotation which changes ak (but not ak-1) cannot change the ordering of

the two columns. If x = ak, then straightforward use of equations (3.23) and (3.24)

or (3.25) and (3.26) gives

XT X – x T x = (v – q)/2 > 0.

(3.33)

Using (3.31) and (3.22) in (3.33) gives

(3.34)

Thus, once columns become sufficiently separated by size and the nonorthogonality sufficiently diminished, the column ordering is stable. When some

columns are equal in norm but orthogonal, the above theorem can be applied to

columns separated by size.

The general question of convergence in the case of equal singular values has been

36

Compact numerical methods for computers

investigated by T Hoy Booker (Booker 1985). The proof in exact arithmetic is

incomplete. However, for a method such as the algorithm presented here, which uses

tolerances for zero, Booker has shown that the cyclic sweeps must eventually

terminate.

Algorithm 1. Singular-value decomposition

procedure NashSVD(nRow, nCo1: integer; {size of problem}

var W: wmatrix; {working matrix}

var Z: rvector); {squares of singular values}

(alg01.pas ==

form a singular value decomposition of matrix A which is stored in the

first nRow rows of working array W and the nCo1 columns of this array.

The first nRow rows of W will become the product U * S of a

conventional svd, where S is the diagonal matrix of singular values.

The last nCo1 rows of W will be the matrix V of a conventional svd.

On return, Z will contain the squares of the singular values. An

extended form of this commentary can be displayed on the screen by

removing the comment braces on the writeln statements below.

Copyright 1988 J. C. Nash

}

Var

i, j, k, EstColRank, RotCount, SweepCount, slimit : integer;

eps, e2, tol, vt, p, h2, x0, y0, q, r, c0, s0, c2, d1, d2 : real;

procedure rotate; (STEP 10 as a procedure}

(This rotation acts on both U and V, by storing V at the bottom of U}

begin (>}

end; { rotate }

begin { procedure SVD }

{ -- remove the comment braces to allow message to be displayed -writeln(‘Nash Singular Value Decomposition (NashSVD).’);

writeln;

writeln(‘The program takes as input a real matrix A.’);

writeln;

writeln(‘Let U and V be orthogonal matrices, & S’);

writeln(‘a diagonal matrix, such that U” A V = S.’);

writeln(‘Then A = U S V” is the decomposition.’);

writeln(‘A is assumed to have more rows than columns. If it’);

writeln(‘does not, the svd of the transpose A” gives the svd’);

writeln(‘of A, since A” = V S U”.’);

writeln;

writeln(‘If A has nRow rows and nCo1