Chap 7
Chap 7
Introduction to Fortran
The Fortran Programming Language
The Fortran programming language was one of the rst (if not the rst) high level languages developed
for computers. It is referred to as a high level language to contrast it with machine language or assembly
language which communicate directly with the computers processor with very primitive instructions. Since
all that a computer can really understand are these primitive machine language instructions, a Fortran
program must be translated into machine language by a special program called a Fortran compiler before it
can be executed. Since the processors in various computers are not all the same, their machine languages are
not all the same. For a variety of reasons, not all Fortran compilers are the same. For example, more recent
Fortran compilers allow operations not allowed by earlier versions. In this chapter, we will only describe
features that one can expect to have available with whatever compiler one may have available.
Fortran was initially developed almost exclusively for performing numeric computations (Fortran is an
acronym for Formula Translation), and a host of other languages (Pascal, Ada, Cobol, C, etc.) have been
developed that are more suited to nonnumerical operations such as searching databases for information.
Fortran has managed to adapt itself to the changing nature of computing and has survived, despite repeated
predictions of its death. It is still the major language of science and is heavily used in statistical computing.
The most standard version of Fortran is referred to as Fortran 77 since it is based on a standard
established in 1977. A new standard was developed in 1990 that incorporates some of the useful ideas from
other languages but we will restrict ourselves to Fortran 77.
The Basic Elements of Fortran
A Fortran program consists of a series of lines of code. These lines are executed in the order that they
appear unless there are statements that cause execution to jump from one place in the series of lines to
another. At the end of this chapter, we have listed the code of a complete program and we will refer to this
program as we go along. Learning to write Fortran programs is usually done by imitating programs written
by others. This chapter cannot possibly provide all of the details of Fortran, but rather to point out those
features that are particularly important in statistical computing.
1. Statements begin in column 7 of a line and end before column 72 (if a statement will not t in columns
772, it can be continued onto the next line (or lines) as long as a character is placed in column 6 of
each continuation line). Columns 15 of a line can contain a statement number so that the statement
on that line can be referenced by other statements. Lines that begin with the letter c in column one
are interpreted as comments and are not executed. It is important to include enough comments in any
program in any language so that you (or somebody else) can later understand what the lines of code
are intended to do.
2. A le containing a Fortran program begins with a main program which may be followed by subprograms.
61
62 Fortran chap 7
Subprograms come in two kinds; functions and subroutines. The main program and the subprograms
it calls can be in dierent les which can be compiled separately and then linked together to form an
executable le.
4. A main program begins with a series of statements declaring the names of the variables that are going
to be used by the program and what type of variables they are (see data types below). Note that
variable names must begin with letters of the alphabet, contain letters and the digits 0 through 9, and
(to be safe as dierent compilers allow dierent numbers of characters) contain 6 characters or fewer.
The program ends with the lines stop and end.
5. Between the declarations and the stop and end there are a wide variety of operations that can be
performed, but usually a program consists of 1) reading some information from somewhere (for example,
from a le or from the keyboard from the person using the program 2) performing some numerical or
graphical task, and 3) placing the results of the task somewhere so that the user of the program can use
them (perhaps on the users screen or into a le).
6. Often there is some subtask that needs to be performed several times in a program. For example, it
may be necessary to form the plot of one vector versus another. Rather than have the same set of
lines of code in several dierent places in the main program, one can form them into what is called a
subprogram. A subprogram begins and ends with special lines of code that tell the compiler that it is
in fact a subprogram and not part of the main program. Another use of subprograms is in modular
programming where one breaks a complicated task into a series of tasks that are more managable and
writing a subprogram for each one.
Data Types
Fortran performs numerical and logical operations on variables of a few types, including:
1. Integers, that is, numbers that have no decimal part. There are two kinds of integers; long integers
(also called four-byte integers or integer*4s, they range from roughly 2
31
to 2
31
and use four bytes of
memory) and short integers (also called two-byte integers or integer*2s, they range from roughly 2
15
to 2
15
and use two bytes of memory). One declares variables to be short or long integers by statements
of the form
integer*2 n,m,k
integer*2 n,m,k
or
integer*4 n,m,k
respectively.
2. Floating point numbers, that is, numbers that can contain decimal parts. There are two kinds of
oating point numbers; double precision (also called real*8 reals, they range from roughly 10
300
to
10
300
, take eight bytes of memory, and are accurate to roughly 13 or 14 decimal places) and single
precision (also called real*4 reals, they range from roughly 10
38
to 10
38
, take four bytes of memory,
and are accurate to roughly six or seven decimal places). Single and double precision variables are
declared by lines of the form
real*4 x,y
or
chap 7 Fortran 63
real*8 x,y
respectively.
3. Complex numbers, that is variables that can take on complex values. Actually, complex variables
can be thought of as a pair of real*4 or real*8 oating point numbers. Single and double precision
complex variables are declared by lines of the form
complex*8 x,y
or
complex*16 x,y
respectively.
4. Logical variables, that is, variables that can only take on the values .TRUE. or .FALSE..
5. Character variables, that is, variables that contain characters such as letters of the alphabet or
numerical digits. A character variable can be as long as desired. To declare variables called st1 and
st2 to be of length 10 and 20 respectively, one uses the statement
character st1*10,st2*20
6. Arrays of variables. In addition to scalar variables, one can declare a variable to be a vector (one
dimensional array) or matrix (two dimensional array), or a higher dimensional array. To declare a single
precision 100 by 3 matrix called x, one would have the statement
real*4 x(100,3)
while the statement
character st1(100)*10,st2(200)*20
would declare character arrays of length 100 and 200 called st1 and st2 where each of the 100 elements of
st1 can contain 10 characters, while each of the 200 elements of st2 can contain as many as 20 characters.
Note that Fortran can only perform operations on individual elements of arrays, that is, it has no built-in
ability to perform operations (such as addition) on all the elements of arrays. Most computer systems have
libraries of subprograms that perform operations on entire arrays.
Undeclared Names
It is very good programming pactice to declare every variable that a program uses (most compilers have
a way to warn a programmer of any variables that have not been declared). However, one should be aware of
the fact that any variable whose name begins with one of the letters i, j, k, l, m, or n, is considered to be an
integer (usually an integer*4) unless otherwise declared explicitly, while any variable whose name begins
with one of the other letters of the alphabet is taken to be a real (usually real*4). Often it is useful to
follow these conventions in the naming of variables (although they should also be explicitly declared) so that
someone reading the code will know what type the variables are without having to look at the declarations.
64 Fortran chap 7
Assignments and Arithmetic Operations
An important part of a Fortran program is assigning values to variables and performing arithmetic
operations on variables. There are ve arithmetic operations, namely addition (+), subtraction (-), multi-
plication (*), division (/), and exponentiation (**). If more than one operation is contained in a statement,
then exponentiation is done rst, then multiplication and division (if there are more than one of these they
are done from left to right), and nally addition and subtraction (again if there are more than one of these,
they are done from left to right). Expressions within parentheses are done rst.
When all operands of an arithmetic expression are of the same type, then the nal type of the value
of the expression will be of that type. When they are of dierent types, the value of the expression will
be that of the last binary operation performed where the value of a binary operation of data of dierent
types takes on the type of the higher of the two types, where types are ranked from lowest to highest by
integer*2, integer*4, real*4, real*8, complex*8, and complex*16. One must be careful of expressions
such as x*(i/j) or x**(i/j) when i and j are integers since the result of the division is truncated and not
rounded.
Once an expression has been evaluated, it is converted to the type of the variable on the left hand side
of the expression.
Intrinsic Functions
Fortran has a wide variety of functions built into it. These functions are called intrinsic functions.
Examples include trigonometric and logarithm functions. The table below contains a list of the functions
that a Fortran programmer can expect to have available when writing code. Note that these functions need
not be declared before use. Also, starting with Fortran 77 the concept of generic functions was developed.
For example, in previous versions of Fortran there were four functions for nding the absolute value of a
number depending on its type. Now there is only the single function abs which will return a value having
the same type as its input type. The old functions iabs, abs, dabs, and cabs are still available as well. The
prexes i, a, d, c are used in these functions for integer, real, double precision, and comples, respectively.
Intrinsic Functions
1
functions Denition
Numerical functions:
sdc=COS(sdc), sdc=SIN(sdc), sd=TAN(sd) Trigonometric functions
sd=ACOS(sd), sd=ASIN(sd), sd=ATAN(sd) Inverse trig functions
2
sd=ATAN2(sd1,sd2)
sd=COSH(sd), sd=SINH(sd), sd=TANH(sd) Hyperbolic trigonometric functions
sdc=LOG(sdc), sd=LOG10(sd) Log functions
sdc=SQRT(sdc) Square root
isds=ABS(isdc) Absolute value
sdc=EXP(sdc) Exponentiation
isd=MOD(isd1,isd2) Remainder
3
isd=SIGN(isd1,isd2) Sign function
4
isd=MAX(isd1,isd2,...), isd=MIN(isd1,isd2,...) Maximum and minimum
s=AMAX0(i1,i2,...), s=AMIN0(i1,i2,...) Integer input, single output
i=MAX1(s1,s2,...), i=MIN1(s1,s2,...) Single input, integer output
isd=DIM(isd1,isd2) Positive dierence: MAX(isd1-isd2,0)
d=DPROD(s1,s2) Double precision product of arguments
Type conversion:
i=INT(isdc), s=REAL(isdc), d=DBLE(isdc)
c=CMPLX(isd) Input is real part, imaginary part is 0
c=CMPLX(c) Identity
chap 7 Fortran 65
c=CMPLX(isd1,isd2) First argument is real part, 2nd is imaginary part
Complex numbers:
s=REAL(c), s=AIMAG(c) Real and Imaginary parts
c=CONJG(c), s=CABS(c) Complex conjugate and modulus
Rounding and truncating5:
sd=AINT(sd), sd=ANINT(sd) Truncation and rounding
i=NINT(sd) Rounding to nearest integer
Operations with characters:
C=CHAR(i) Character having input ASCII code
i=ICHAR(C) ASCII code of input character
i=INDEX(str1,str2) Compare strings
6
i=LEN(str) Number of characters in the string
L=LGE(str1,str2), L=LGT(str1,str2), Lexicographical ordering
7
L=LLE(str1,str2), L=LLT(str1,str2)
1
The table lists the input and output type of the function, for example, the notation isds=ABS(isdc)
means that for integer, single, double, and complex input, the output is integer, single, double, and single
respectively (note that CABS(c) is single). For characters, C represents a single character while str represents
a string of characters. Logical variables are denotd by L.
2
The function ATAN2 has two arguments x
1
and x
2
and is dened to be ?????
3
The remaindering function applied to arguments x
1
and x
2
is dened to be x
1
INT(x
1
/x
2
) x
2
.
4
The sign function having two arguments x
1
and x
2
is |x
1
| if x
2
0 and |x
1
| if x
2
< 0.
5
Truncation means, for example, REAL(INT(s)). Rounding means, for example, REAL(INT(s+.5) if s 0 or
REAL(INT(s-0.5)) if s< 0.
6
If str2 is contained in str1 then the result is the position of the rst character of str2 in str1; otherwise
the result is 0.
6
For example, LGE(str1,str2) is true if the two strings are the same or if str1 follows str2 in lexicographical
order (comparing ASCII codes character by character).
Input and Output
Perhaps the most dicult part of Fortran programming are the operations of inputting information into
the program and outputting information from the program to a place where it can be used later. These
operations are performed using the read and write statements respectively. There are four basic parts of
these statements:
1. A description of where the information is coming from or going to,
2. A description of the format that the information is in if it is being read or a description of the format
that is desired for the information if it is being written,
3. The statement number of the statement to be executed next if something goes wrong during the reading
or writing,
4. The names of the variables that are to receive the information or the names of the variables whose
values are to be written.
It is almost impossible to describe all of the variations on these four basic ideas. Most Fortran pro-
grammers learn them by imitating what other programmers have done. An example which contains all four
elements is provided by the statements
read(2,10,err=20) ((x(i,j),j=1,10),i=1,5)
10 format(10x,6f10.0)
66 Fortran chap 7
The read statement says to read 50 numbers from the le having logical unit number 2 according to the
format statement having statement number 10, and assign these numbers to the upper left hand corner of
the matrix x (upper ve rows and 10 columns). The err=20 means that if something goes wrong during
the reading process (such as if a nonnumeric character were mistakenly entered in the le) then the next
statement to be executed by the program is the one having statement number 20. The 6f10.0 in the format
statement says that the 50 numbers should be read six per line, with the rst of the six in columns 1120,
the next in 21-30, and so on, and the the numbers can be anywhere in these columns as long as there is a
decimal point (note that the 10x says to skip the rst 10 columns on each linethis makes it posible to have
extraneous information in the rst 10 columns (such as a date if one were reading information collected over
time and wanted the date in the le but didnt want to input it into the program).
Opening a File for Reading or Writing
Before reading from or writing to a le, one must rst alert the computers operating system what the
name of the le is using the open statement. One also uses the open statement to attach a logical unit
number to the le (such as the 2 in the write statement above) and then uses this number in future read
statements. Note that one can have several les open at the same time, each with its own logical unit
number.
Looping and Branching
An important part of Fortran are the statements that allow one to perform loops, that is executing sets
of lines of code repeatedly. The most famous example is a do loop, which is of the form
do 10 i=i1,i2,i3
....
....
10 continue
where i1, i2, and i3 are integers (or integer expressions). The statements between the do and the continue
are all executed with the variable i having value i1, then they are done again with i having value i1+i3,
and so on, until the value of i is outside of the range i1 to i2. Note that do loops can be nested, and that
the do loop counter (the variable i in the example above) can not be changed during the loop.
Another example of a loop is the so-called implied loop which is of the form
i=1
5 continue
....
....
if(i.eq.n) go to 10
i=i+1
go to 5
10 continue
There are a number of branching constructs available. The go to used above is an unconditional branch.
The line
if(i.eq.3) go to 10
is an example of a conditional branch. In addition to .eq. we have .ne., .le., .lt., .ge., and .gt.. The
use of go tos is discouraged as they make reading the code of a program dicult. Other, more readable
branches are available such as
if(i.eq.1) then
chap 7 Fortran 67
...
...
elseif(i.eq.2) then
...
...
else
...
...
endif
Subroutines and Functions
There are two kinds of subprograms in Fortran; subroutines and functions. We have already seen
examples of functions in the intrinsic functions above, for example the statement x=sqrt(8.0*atan(1.0))
uses two intrinsic functions; atan and sqrt. Functions return single values through their names and any
number of values (scalars or arrays) through their argument list. A subroutine on the other hand returns
nothing through its name, can return any number of values through its argument list, and is invoked by a
statement of the form
call subname(x,n,m,y,z)
To illustrate writing functions and subroutines, consider the following code:
double precision function inprod(x,y,n)
dimension x(n),y(n)
inprod=0.0d0
do 10 i=1,n
10 inprod=inprod+dble(x(i))*dble(y(i))
return
end
subroutine mtmult(a,b,n,ndim,c)
dimension a(ndim,n),b(ndim,n),c(ndim,n)
do 10 i=1,n
do 10 j=1,n
c1=0.0
do 20 k=1,n
20 c1=c1+a(i,k)*b(k,j)
10 c(i,j)=c1
return
end
Thus functions and subroutines both end with the statements return and end, and begin with a line that
gives their name (note how functions and subroutines name themselves dierently and that functions declare
their own type) and species a list of arguments, that is variables that are to be used in the function or
subroutine.
There are a few simple rules that one must use in writing and calling subroutines:
1. Subprograms can call other subprograms but a subprogram can never call itself either directly or in-
directly (by calling another subprogram which in turn calls the original subprogram). Some modern
compilers allow this recursive calling, but most still do not so we will not discuss it.
2. A subroutine begins with a statement beginning with the word subroutine followed by a space, followed
by the name of the subroutine, followed (optionally) by a list of variable names in parentheses and
68 Fortran chap 7
separated by commas. These variables are referred to as the arguments of the subroutine. The variables
in the calling program are not available to the subprogram unless they are passed to it through the
argument list (one can also use a common statement but we ignore that for now). We consider arguments
in more detail later in the section discussing arrays in Fortran.
3. One of the most common errors in Fortran occurs if there is a mismatch between the type of a variable
in a main program and the type expected by the subroutine. For example, if a subroutine expects a
single precision variable and is passed a double precision number, then the number it actually receives
will not be what is expected and in many cases this will cause a problem (note that the compiler does
not check for type mismatches.)
Compiling and Linking
The rst program considered later in this chapter (which is stored in the le regswp.f) is self-contained
except that it uses a subroutine called tcdf which is in another le called tcdf.f. To compile and link
regswp.f and tcdf.f and create an executable le called regswp, one does the following (the -c switch on
the rst line says to only compile tcdf.f while the -o switch on the second line says to name the executable
le regswp).
f77 -c tcdf.f
f77 -o regswp regswp.f tcdf.o
Arrays in Fortran
Arrays are an often poorly understood part of Fortran, particularly in relation to how they are used in
subprograms. Unfortunately, arrays and subprograms are often covered at the end of introductory Fortran
courses, if at all. We will consider one dimensional arrays (vectors), two dimensional arrays (matrices having
row and column indices), and three dimensional arrays (often visualized as a vector of matrices, the rst two
indices being the row and column numbers and the third index being the matrix number). The extension to
higher dimensional arrays should be obvious.
Arrays are Statically Allocated
With a few notable exceptions, Fortran compilers force the programmer to specify the size of an array
in a main program before it is compiled and linked. This is done with a dimension statement (such as
dimension a(10,10,4), which allocates a three dimensional array a of four 10 by 10 matrices). Thus the
memory for an array is said to be statically allocated. Many programming languages allow dynamic
memory allocation, that is, the program can ask the user to specify the size of a problem and then allocate
exactly the right amount of space for the arrays that are involved. In contrast, the Fortran programmer
must anticipate the maximum size that a user might want and do the dimension accordingly.
Arrays are Stored in Memory in Column Order
Most features of a language such as Fortran do not require the programmer to keep in mind how the
program interacts with a computers memory. However, the ordering of the elements of an array is a notable
exception to this rule. The elements are stored in consecutive locations in memory in what is called column
order. For a vector this just means that consecutive elements are stored in consecutive locations in memory,
but how the ordering is done for higher dimensional arrays is not obvious. For a matrix, the basic rule is
that the elements of the rst column are stored in order followed by the elements of the second column, and
so on. Thus code such as
dimension a(4,4)
data a/11,21,31,41,12,22,32,42,13,23,33,43,14,24,34,44/
chap 7 Fortran 69
do 10 i=1,4
10 write(*,20) (a(i,j),j=1,4)
20 format(1x,4f4.0)
leads to
11. 12. 13. 14.
21. 22. 23. 24.
31. 32. 33. 34.
41. 42. 43. 44.
that is, the rst four elements in the data statement are placed into the rst column of a, the next four in
the second column, and so on.
For a three dimensional array, we have rows, columns, and matrix number. If we had dimensioned a
above by dimension a(2,2,4), then the four matrices would be
11 31 12 32 13 33 14 34
21 41 22 42 23 43 24 44,
that is, the rst four elements go into the rst matrix (with the rst two in the rst column and the next
two in the second), the next four in the second matrix, and so on.
Perhaps the best way to visualize this is to think of all arrays as vectors, with the dimensions specifying
how that vector gets reshaped into the array. The important point is that in the assignment of elements into
the array so that the rst index varies most rapidly. Thus if we added the line
write(*,20) a
to the code above, we would get
11. 21. 31. 41.
12. 22. 32. 42.
13. 23. 33. 43.
14. 24. 34. 44.
Dimensioning Arrays in a Subprogram
An important part of programming is to create subprograms (subroutines or functions) to perform often
needed calculations (such as adding or multiplying matrices). In Fortran, variables dened in one module
(main program or subprogram) are unknown to any other module unless the programmer does something
special to pass the values from one module to the other. The standard method to do this in Fortran is
to have an argument in the subprogram list and in the call to the subprogram for the variable of interest.
This is called passing a variable to the subprogram. It is still necessary to dimension any passed array in
the subroutine. Further, it is posible to use variables to dimension arrays in a subprogram (note that such
a variable must also be in the argument list).
A simple example of this is to write a subroutine which writes out the elements of an n by n matrix
subroutine matprt(b,n)
dimension b(n,n)
do 10 i=1,n
10 write(*,20) (b(i,j),j=1,n)
20 format(1x,6f12.5)
return
end
70 Fortran chap 7
Note that the variables in such a subprogram are called dummy variables and need not have the same
names as the corresponding variables in the calling subprogram.
In keeping with what we said above, our subroutine will take the rst n
2
elements being passed from
the array in the calling routine, form the matrix b by column and then write out the rows of the result. But
what are the rst n
2
elements of the array in the calling routine? In the following main program,
dimension a(4,4)
data a/11,21,31,41,12,22,32,42,13,23,33,43,14,24,34,44/
call matprt(a,3)
stop
end
the rst 9 elements of a get passed to matprt and thus the output of the program will be
11.00000 41.00000 32.00000
21.00000 12.00000 42.00000
31.00000 22.00000 13.00000
which is probably not at all what the programmer intended (presumably they wanted the upper left hand 3
by 3 portion of the matrix displayed).
The ndim problem
This last example illustrates a problem which causes the vast majority of hard-to-nd Fortran bugs.
Whats worse is that many times the programmer has no idea that there is a problem (in the example we
knew what the correct values should have been). One can only hope that the incorrect passing leads to a
big enough problem (such as division by zero) so that the subtle passing problem gets diagnosed.
I refer to this problem as the ndim problem since the solution is to include a third argument in the
subroutine (often called ndim) corresponding to the actual row dimension of the array in the calling routine
(in this case it would be 4). In our example, ndim is the natural name for this argument as it is the actual
dimension corresponding to the argument n in the calling routine. Thus the subroutine would be modied
to be
subroutine matprt(b,n,ndim)
dimension b(ndim,n)
do 10 i=1,n
10 write(*,20) (b(i,j),j=1,n)
20 format(1x,6f12.5)
return
end
and the call would be
call matprt(a,3,4)
which would give the intended output.
A Bonus
A very powerful (but often dangerous) feature of Fortran is that one can pass an array having one
number of indices (vector, matrix, three dimensional, etc.) to a subroutine which is expecting an array
having a dierent number. This allows us to do something such as
chap 7 Fortran 71
dimension x(8,2)
data x/11,21,31,41,12,22,32,42,13,23,33,43,14,24,34,44/
call mean(x(1,2),8,xbar)
write(*,10) xbar
10 format(1x,xbar = ,f12.5)
stop
end
subroutine mean(x,n,xbar)
dimension x(n)
xbar=0.0
do 10 i=1,n
10 xbar=xbar+x(i)
xbar=xbar/n
return
end
which would nd the sample mean of the second column of the matrix x. This example also illustrates
another important fact about array passing. Weve seen that a subroutine takes consecutive elements being
passed from the calling routine, shapes them into an array according to the dimension in the subprogram,
and then operates on the array. In the line call mean(x(1,2),8,xbar), we see that the argument in the
call is actually only pointing to the rst element in x to be passed (note that if the argument had just
been x, the pointer would be at the rst element in x, that is, x(1,1)). This is useful in something such as
dimension x(100),xbar(100)
n=100
do 10 i=1,n
10 x(i)=i
do 20 i=1,n
20 call mean(x(n-i+1),i,xbar(i))
which nds the mean of the last i xs for i = 1, . . . , 100.
A nal example of this is given by writing a subroutine to calculate X
T
X for an n m matrix X:
subroutine xprx(x,n,m,ndim,mdim,xtx)
dimension x(ndim,m),xtx(mdim,mdim)
double precision dbprod
c
do 10 i=1,m
do 10 j=1,i
xtx(i,j)=dbprod(x(1,i),x(1,j),n)
10 xtx(j,i)=xtx(i,j)
return
end
c
double precision function dbprod(x,y,n)
dimension x(1),y(1)
dbprod=0.0d0
c
do 10 i=1,n
10 dbprod=dbprod+dble(x(i))*dble(y(i))
return
end
There are several things to note here:
1. We need both an ndim and an mdim in this case.
72 Fortran chap 7
2. The (i, j)th element of X
T
X is the inner product of its ith column with its jth column so we are using
the inner product function dbprod and just pointing to the beginning of the two columns of interest in
the call to dbprod.
3. Since X
T
X is symmetric, xprx only calculates the the elements in the lower triangle and puts each
element into the corresponding place in the upper triangle as well.
4. The variables x and y have been dimensioned with a 1 in the function dbprod. This is consistent
with what we have said above about the role of dimensions. This sort of thing is sometimes very useful
in avoiding extra arguments. For example, if we were writing a subroutine to nd the zeros of the
polynomial g(z) =
p
j=0
j
z
j
we would probably pass
0
,
1
, . . . ,
p
in the vector alpha of length p+1
(since there is no zero index in standard Fortran) and to avoid passing the value of p + 1, we can just
dimension alpha with a 1.
An Example
We consider a program to do basic multiple linear regression which illustrates reading and writing and
how to use subroutines.
Review of Regression Analysis
A very common situation in statistics is to have a set of n objects with measurements on a set of m+1
variables for each object and we wish to explain the behavior of one of the variables y (the dependent
variable) in terms of the other m variables X
1
, . . . , X
m
(the independent or regressor variables) according
to
y
i
=
0
+
1
X
i1
+ +
m
X
im
+
i
, i = 1, . . . , n,
where y
i
and X
i1
, . . . , X
im
are the measurements for the ith object,
0
,
1
, . . . ,
m
are a set of unknown
parameters, and
i
is called the error for the ith object. The s are assumed to be statistically independent
for the dierent objects, have expectation 0, all have the same variance
2
, and are included in the model
to explain the fact that not all objects whose X values are the same will have the same y value.
The aims of the regression analysis are twofold: 1) to determine which of the Xs are important in
explaining the behavior of y, and 2) to develop a prediction formula that can be used to nd reasonable
values of y for a new object for which we only know the Xs.
An example of such a situation is the Hald data set included at the end of this report, wherein n = 13,
m = 4, y is the amount of heat generated during the hardening of cement, and the Xs are the amount of
four dierent chemicals used in the production of the cement. In the data le, y is listed rst followed by
the four Xs.
If we rewrite the multiple linear regression model above in matrix form as
y = X + ,
where y is the n1 vector of the y values, X is the n(m+1) matrix of the X values with a column of 1s
appended at the beginning, is the (m + 1) 1 vector of coecients, and is the n 1 vector of s, then
it is easy to write down the basic results of a regression analysis. First, the best linear unbiased estimators
= (X
T
X)
1
X
T
y, s
2
=
RSS
n m1
, RSS = e
T
e, e = y y, y = X
,
where the vectors y and e are called the vectors of tted values and residuals, and RSS is the residual sum
of squares.
If we further assume that the s are normally distributed, then the hypothesis that
j
is zero while the
other s are not is rejected at the signicance level if |t
j
| > t
/2,nm1
where t
/2,nm1
is the upper
/2 critical point of a t distribution having n m1 degrees of freedom and
t
j
=
s
2
(X
T
X)
1
j+1,j+1
.
chap 7 Fortran 73
Note that the denominator of t
j
is called the standard error of
j
. Typically, instead of just reporting
whether the hypothesis is rejected or not, one determines the p-value of
j
, that is, the probability that a t
with n m1 degrees of freedom is outside the interval t
j
, in which case the null hypothesis is rejected
if p
j
< .
A measure of how much of the variability in y is explained by its linear dependence on the Xs is
R
2
= 1
s
2
e
s
2
y
,
where s
2
e
and s
2
y
are the sample variances of the es and the ys, respectively.
A Multiple Linear Regression Program Using SWEEP
In this section we give as an example a regression program which
1. Asks the user for a le name containing a regression data set to analyze and a le name to receive the
results of the analysis. The program assumes that the le starts with a line containing a label to be
displayed on the output and a line with the number of observations, n, and the number of independent
variables, m, in the regression (not counting an intercept term). Each of the remaining n lines in
the input le is assumed to contain the value of the dependent variable followed by the values of the
independent variables. An example input le is given below which contains the so-called Hald Data
which has 13 observations and four independent variables:
Hald Data (From Draper and Smith, pg. 296, 304, 630)
13 4
78.5 7 26 6 60
74.3 1 29 15 52
104.3 11 56 8 20
87.6 11 31 8 47
95.9 7 52 6 33
109.2 11 55 9 22
102.7 3 71 17 6
72.5 1 31 22 44
93.1 2 54 18 22
115.9 21 47 4 26
83.8 1 40 23 34
113.3 11 66 9 12
109.4 10 68 8 12
2. Reads the data into the vector y and the matrix X (putting a column of 1s into the beginning of X so
that it is n (m + 1) after reading the data), forms the augmented cross-product matrix
A =
X
T
X X
T
y
y
T
X y
y
,
and then gets the regression information by using the matrix SWEEP algorithm.
3. The least squares estimates, their standard errors, t-statistics, and corresponding p-values are then
written out, as well as the value of R
2
. The result for the Hald Data are:
Hald Data (From Draper and Smith, pg. 296, 304, 630)
Regression results, R-Squared = 0.98237562, d.f. = 8
i beta(i) se(i) t(i) pval(i)
-----------------------------------------------------
0 62.405369 70.070959 0.890602 0.399134
1 1.551103 0.744770 2.082660 0.070822
2 0.510168 0.723788 0.704858 0.500901
74 Fortran chap 7
3 0.101909 0.754709 0.135031 0.895923
4 -0.144061 0.709052 -0.203174 0.844071
The Program
1 c -------------------------------------------
2 c REGSWP.F: Regression program using SWEEP
3 c
4 c Uses functions inprod and smean and subroutines
5 c tcdf, tpdf, betai, and gama
6 c -------------------------------------------
7
8 c ---------------------------------------------
9 c Declare variables and external functions:
10 c ---------------------------------------------
11
12 character*60 fnin,fnout,label
13
14 real*8 x(1000,20),y(1000),a(21,21),beta(21),se(21),t(21),pval(21)
15
16 real*8 cdf,pdf,inprod,rss,s2,ybar,ssy,rsq
17
18 real*8 smean
19
20 logical fnexist
21
22 data ndim/21/
23
24 c ------------
25 c Get data:
26 c ------------
27
28 write(*,10)
29 10 format( Enter file name containing the regression data: $)
30
31 read(*,20) fnin
32 20 format(a60)
33
34 inquire(file=fnin,exist=fnexist)
35
36 if(.not.fnexist) go to 100
37
38 write(*,30)
39 30 format( Enter the file name to get the output: $)
40 read(*,20) fnout
41
42 open(2,file=fnout,status=new)
43 open(1,file=fnin,status=old)
44
45 c Read first two lines of input file:
46
47 read(1,20,err=110) label
48 read(1,*,err=120) n,np
49
50 c Read data, putting Xs in columns 2,...,m+1 and 1s in 1st column:
51
52 m=np+1
53 do 40 i=1,n
54 x(i,1)=1.0
55 40 read(1,*,err=130) y(i),(x(i,j),j=2,m)
56
57 c -----------------------------------------
58 c Form the augmented crossproduct matrix:
59 c -----------------------------------------
60
61 mp1=m+1
62 do 60 i=1,m
63 a(i,mp1)=inprod(x(1,i),y,n)
64 a(mp1,i)=a(i,mp1)
65 do 50 j=1,i
66 a(i,j)=inprod(x(1,i),x(1,j),n)
67 50 a(j,i)=a(i,j)
68 60 continue
69 a(mp1,mp1)=inprod(y,y,n)
70
71 c ----------------------------------------------------------
72 c Sweep it on first m diagonals and check for singularity:
73 c ----------------------------------------------------------
chap 7 Fortran 75
74
75 call sweep(a,ndim,mp1,1,m,ier)
76
77 if(ier.eq.1) go to 90
78
79
80 c -----------------------------------------------------------------
81 c Now get LSEs, RSS, R squared, ses, t-statistics, and p-values:
82 c -----------------------------------------------------------------
83
84 rss=a(mp1,mp1)
85 s2=rss/(n-m)
86
87 ybar=smean(y,n)
88
89 do 70 i=1,n
90 70 ssy=ssy+(y(i)-ybar)**2
91
92 rsq=1.-(rss/ssy)
93
94 do 80 i=1,m
95 beta(i)=a(i,mp1)
96 se(i)=sqrt(s2*a(i,i))
97 t(i)=beta(i)/se(i)
98 call tcdf(dabs(dble(t(i))),n-m,cdf,pdf)
99 80 pval(i)=2.*(1.-cdf)
100
101 c --------------------
102 c write out results:
103 c --------------------
104
105 write(2,81) label
106 81 format(1x,a60)
107
108 write(2,82) rsq,n-m
109 82 format(/, Regression results, R-Squared = ,f10.8,
110 1 , d.f. = ,i3)
111
112 write(2,83)
113 83 format(/,10x,i,5x,beta(i),7x,se(i),8x,t(i),5x,pval(i),/
114 1 6x,53(1h-))
115
116 do 84 i=1,m
117 84 write(2,85) i-1,beta(i),se(i),t(i),pval(i)
118 85 format(6x,i5,4f12.6)
119 go to 99
120
121 c -----------------
122 c Error handling:
123 c -----------------
124
125 90 write(*,*) Regression matrix singular
126 go to 99
127
128 100 write(*,*) File doesnt exist
129 go to 99
130
131 110 write(*,*) Error reading label
132 go to 99
133
134 120 write(*,*) Error reading n and m
135 go to 99
136
137 130 write(*,135) i
138 135 format( Error reading row ,i5)
139
140 99 continue
141 stop
142 end
143 subroutine sweep(a,mdim,m,k1,k2,ier)
144 c ---------------------------------------------------------------
145 c
146 c Subroutine to sweep the mxm matrix a on its k1 st thru k2 th
147 c diagonals. ier is 1 if a is singular. mdim is row dimension
148 c of a in calling routine.
149 c
150 c ---------------------------------------------------------------
151
152 double precision a(mdim,1)
153 double precision d
76 Fortran chap 7
154
155 ier=1
156
157 do 50 k=k1,k2
158
159 if(abs(a(k,k)).lt.1.e-20) return
160
161 d=1./a(k,k)
162 a(k,k)=1.
163 do 10 i=1,m
164 a(k,i)=d*a(k,i)
165 10 if(i.ne.k) a(i,k)=-a(i,k)*d
166
167 do 20 i=1,m
168 do 20 j=1,m
169 20 if((i.ne.k).and.(j.ne.k)) a(i,j)=a(i,j)+a(i,k)*a(k,j)/d
170
171 50 continue
172
173 ier=0
174
175 return
176 end
177
178 double precision function inprod(x,y,n)
179 double precision x(n),y(n)
180
181 inprod=0.0d0
182 do 10 i=1,n
183 10 inprod=inprod+x(i)*y(i)
184
185 return
186 end
187
188 double precision function smean(x,n)
189 double precision x(n)
190
191 smean=0.0d0
192 do 10 i=1,n
193 10 smean=smean+x(i)
194 smean=smean/n
195
196 return
197 end
198
The External Statement and Numerical Integration
The Common and Named Common Statements and Newton-Raphson
See the chapter on Iterative Methods for Parameter Estimation for examples of using the common
and named common statements.