0% found this document useful (0 votes)
66 views11 pages

Chapter 7: Run Time Environments: 7.1: Storage Organization

This document summarizes key concepts about run time environments and storage allocation from Chapter 7, including: 1. Memory is divided into segments for code, static data, stack, and heap. The stack grows downward and is used for function calls and local variables, while the heap is used for dynamically allocated data. 2. Programming languages support both static and dynamic storage allocation. Static allocation is faster but less flexible, while dynamic allocation provides more flexibility at a runtime cost. 3. Function calls use activation records (frames) on the stack to store parameters, local variables, return addresses, and other information for each function invocation. Recursive functions result in multiple activation records.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views11 pages

Chapter 7: Run Time Environments: 7.1: Storage Organization

This document summarizes key concepts about run time environments and storage allocation from Chapter 7, including: 1. Memory is divided into segments for code, static data, stack, and heap. The stack grows downward and is used for function calls and local variables, while the heap is used for dynamically allocated data. 2. Programming languages support both static and dynamic storage allocation. Static allocation is faster but less flexible, while dynamic allocation provides more flexibility at a runtime cost. 3. Function calls use activation records (frames) on the stack to store parameters, local variables, return addresses, and other information for each function invocation. Recursive functions result in multiple activation records.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 7: Run Time Environments

Homework: Read Chapter 7.

7.1: Storage Organization


We are discussing storage organization from the point of view of the compiler, which must allocate space for programs to be run. In particular, we are concerned with only virtual addresses and treat them uniformly. This should be compared with an operating systems treatment, where we worry about how to effectively map this configuration to real memory. For example see see these two diagrams in my OS class notes, which illustrate an OS difficulty with our allocation method, which uses a very large virtual address range and one solution. Some system require various alignment constraints. For example 4-byte integers might need to begin at a byte address that is a multiple of four. Unaligned data might be illegal or might lower performance. To achieve proper alignment padding is often used. Areas (segments) of Memory 1. The code (often called text in OS-speak) is fixed size and unchanging (self-modifying code is long out of fashion). If there is OS support it could be marked execute only (or perhaps read and execute, but not write). All other areas would be marked non-executable (except for systems like lisp that execute their data). 2. There is likely data of fixed size whose need can be determined by the compiler by examining the program's structure (and not by determining the program's execution pattern). One example is global data. Storage for this data would be allocated in the next area right after the code. A key point is that since the code and this area are of fixed size that does not change during execution, they, unlike the next two areas, have no need for an expansion region. 3. The stack is used for memory whose lifetime is stack-like. It is organized into activation records that are created as a procedure is called and destroyed when the function exits. It abuts the area of unused memory so can grow easily. Typically the stack is stored at the highest virtual addresses and grows downward (toward small addresses). However, it is sometimes easier in describing the activation records and their uses to pretend that the addresses are increasing (so that increments are positive). 4. The heap is used for data whose lifetime is not as easily described. This data is allocated by the program itself, typically either with a language construct, such as new, or via a library function call, such as malloc(). It is deallocated either by another executable statement, such as a call to free(), or automatically by the system.

7.1.1: Static Versus Dynamic Storage Allocation


Much (often most) data cannot be statically allocated. Either its size is not know at compile time or its lifetime is only a subset of the program's execution. Early versions of Fortran used only statically allocated data. This required that each array had a constant size specified in the program. Another consequence of supporting only static allocation was that recursion was forbidden (otherwise the compiler could not tell how many versions of a variable would be needed). Modern languages, including newer versions of Fortran, support both static and dynamic allocation of memory. The advantage supporting dynamic storage allocation is the increased flexibility and storage efficiency possible (instead of declaring an array to have a size adequate for the largest data set; just allocate what is needed). The advantage of static storage allocation is that it avoids the runtime costs for allocation/deallocation and may permit faster code sequences for referencing the data. An (unfortunately, all too common) error is a so-called memory leak where a long running program repeated allocates memory that it fails to delete, even after it can no longer be referenced. To avoid memory leaks and ease programming, several programming language systems employ automatic garbage collection. That means the runtime system itself can determine if data can no longer be referenced and if so automatically deallocates it.

7.2: Stack Allocation of Space


Achievements. 1. Space shared by procedure calls that have disjoint durations (despite being unable to check disjointness statically). 2. The relative address of each nonlocal variable is constant throughout execution.

7.2.1: Activation Trees


Recall the fibonacci sequence 1,1,2,3,5,8, ... defined by f(1)=f(2)=1 and, for n>2, f(n)=f(n-1)+f(n-2). Consider the function calls that result from a main program calling f(5). On the left we show the calls and returns linearly and on the right in tree form. The latter is sometimes called the activation tree or call tree.
System starts main enter f(5) enter f(4) enter f(3) enter f(2) exit f(2)

enter f(1) exit f(1) exit f(3) enter f(2) exit f(2) exit f(4) enter f(3) enter f(2) exit f(2) enter f(1) exit f(1) exit f(3) exit f(5) main ends

int a[10]; int main(){ int i; for (i=0; i<10; i++){ a[i] = f(i); } } int f (int n) { if (n<3) return 1; return f(n-1)+f(n-2); }

We can make the following observation about these procedure calls. 1. If an activation of p calls q, then that activation of q terminates no later than the activation of p. 2. The order of activations (procedure calls) corresponds to a preorder traversal of the call tree. 3. The order of de-activations (procedure returns) corresponds to postorder traversal of the call tree. 4. If execution is currently in an activation corresponding to a node N of the activation tree, then the activations that are currently live are those corresponding to N and its ancestors in the tree. They were called in the order given by the rootto-N path in the tree and the returns will occur in the reverse order.

7.2.2: Activation Records (ARs)


The information needed for each invocation of a procedure is kept in a runtime data structure called an activation record (AR) or frame. The frames are kept in a stack called the control stack. At any point in time the number of frames on the stack is the current depth of procedure calls. For example, in the fibonacci execution shown above when f(4) is active there are three activation records on the control stack. ARs vary with the language and compiler implementation. Typical components are described and pictured below. In the diagrams the stack grows down the page. 1. Temporaries. For example, recall the temporaries generated during expression evaluation. Often these can be held in machine registers, when not possible the temporary area is used. 2. Data local to the procedure being activated.

3. Saved status from the caller, which typically includes the return address and the machine registers. The register values are restored when control returns to the caller. 4. The access link is described below. 5. The control link connects the ARs by pointing to the AR of the caller. 6. Returned values are normally (but not always) placed in registers. 7. The first few parameters are normally (but not always) placed in registers. The diagram on the right shows (part of) the control stack for the fibonacci example at three points during the execution. In the upper left we have the initial state, We show the global variable a, although it is not in an activation record and actually is allocated before the program begins execution (it is statically allocated; recall that the stack and heap are each dynamically allocated). Also shown is the activation record for main, which contains storage for the local variable i. Below the initial state we see the next state when main has called f(1) and there are two activation records, one for main and one for f. The activation record for f contains space for the parameter n and and also for the result. There are no local variables in f. At the far right is a later state in the execution when f(4) has been called by main and has in turn called f(2). There are three activation records, one for main and two for f. It is these multiple activations for f that permits the recursive execution. There are two locations for n and two for the result.

7.2.3: Calling Sequences


The calling sequence, executed when one procedure (the caller) calls another (the callee), allocates an activation record (AR) on the stack and fills in the fields. Part of this work is done by the caller; the remainder by the callee. Although the work is shared, the AR is called the callee's AR. Since the procedure being called is defined in one place, but called from many, there are more instances of the caller activation code than of the callee activation code. Thus it is wise, all else being equal, to assign as much of the work to the callee.

1. Values computed by the caller are placed before any items of size unknown by the caller. This way they can be referenced by the caller using fixed offsets. One possibility is to place values computed by the caller at the beginning of the activation record (AR), i.e., near the AR of the caller. The number of arguments may not be the same for different calls of the same function (so called varargs, e.g. printf() in C). 2. Fixed length items are placed next. These include the links and the saved status. 3. Finally come items allocated by the callee whose size is known only at run-time, e.g., arrays whose size depends on the parameters. 4. The stack pointer sp is between the last two so the temporaries and local data are actually above the stack. This would seem more surprising if I used the book's terminology, which is top_sp. Fixed length data can be referenced by fixed offsets (known to the intermediate code generator) from the sp. The top picture illustrates the situation where a pink procedure (the caller) calls a blue procedure (the callee). Also shown is Blue's AR. Note that responsibility for this single AR is shared by both procedures. The picture is just an approximation: For example, the returned value is actually the Blue's responsibility (although the space might well be allocated by Pink. Also some of the saved status, e.g., the old sp, is saved by Pink. The bottom picture shows what happens when Blue, the callee, itself calls a green procedure and thus Blue is also a caller. You can see that Blue's responsibility includes part of its AR as well as part of Green's. Calling Sequence 1. The caller evaluates the arguments. (I use arguments for the caller, parameters for the callee.) 2. The caller stores the return address and the (soon-to-be-updated) sp in the callee's AR. 3. The caller increments sp so that instead of pointing into its AR, it points to the corresponding point in the callee's AR. 4. The callee saves the registers and other (system dependent) information. 5. The callee allocates and initializes its local data.

6. The callee begins execution. Return Sequence 1. The callee stores the return value near the parameters. Note that this address can be determined by the caller using the old (soon-to-be-restored) sp. 2. The callee restores sp and the registers. 3. The callee jumps to the return address. Note that varagrs are supported.

7.2.4: Variable-Length Data on the Stack


There are two flavors of variable-length data.

Data obtained by malloc/new have hard to determine lifetimes and are stored in the heap instead of the stack. Data, such as arrays with bounds determined by the parameters are still stack like in their lifetimes (if A calls B, these variables of A are allocated before and released after the corresponding variables of B).

It is the second flavor that we wish to allocate on the stack. The goal is for the (called) procedure to be able to access these arrays using addresses determinable at compile time even though the size of the arrays (and hence the location of all but the first) is not know until the program is called and indeed often differs from one call to the next. The solution is to leave room for pointers to the arrays in the AR. These are fixed size and can thus be accessed using static offsets. Then when the procedure is invoked and the sizes are known, the pointers are filled in and the space allocated. A small change caused by storing these variable size items on the stack is that it no longer is obvious where the real top of the stack is located relative to sp. Consequently another pointer (call it real-top-of-stack) is also kept. This is used on a call to tell where the new allocation record should begin.

7.3: Access to Nonlocal Data on the Stack


As we shall see the ability of procedure p to access data declared outside of p (either declared globally outside of all procedures or declared inside another procedure q) offers interesting challenges.

7.3.1: Data Access Without Nested Procedures


In languages like standard C without nested procedures, visible names are either local to the procedure in question or are declared globally. 1. For global names the address is known statically at compile time providing there is only one source file. If multiple source files, the linker knows. In either case no reference to the activation record is needed; the addresses are know prior to execution. 2. For names local to the current procedure, the address needed is in the AR at a known-at-compile-time constant offset from the sp. In the case of variable size arrays, the constant offset refers to a pointer to the actual storage.

7.3.2: Issues With Nested Procedures


With nested procedures a complication arises. Say g is nested inside f. So g can refer to names declared in f. These names refer to objects in the AR for f; the difficulty is finding that AR when g is executing. We can't tell at compile time where the (most recent) AR for f will be relative to the current AR for g since a dynamically-determined number of routines could have been called in the middle. There is an example in the next section. in which g refers to x, which is declared in the immediately outer scope (main) but the AR is 2 away because f was invoked in between. (In that example you can tell at compile time what was called in what order, but with a more complicated program having data-dependent branches, it is not possible.)

7.3.3: A language with Nested Procedure Declarations


As we have discussed, the 1e, which you have, uses pascal, which many of you don't know. The 2e, which you don't have uses C, which you do know. Since pascal supports nested procedures, this is what the 1e uses to give examples. The 2e asserts (correctly) that C doesn't have nested procedures so introduces ML, which does (and is quite slick), but which unfortunately many of you don't know and I haven't used. Fortunately a common extension to C is to permit nested procedures. In particular, gcc supports nested procedures. To check my memory I compiled and ran the following program.
#include <stdio.h> int main (int argc, char *argv[]) { int x = 10; void g(int y) { int z = x;

return;

int f (int y) { g(y); return y+1; } printf("The answer is %d\n", f(x)); return 0;

The program compiles without errors and the correct answer of 11 is printed. So we can use C (really the GCC, et al extension of C).

7.3.4: Nesting Depth


Outermost procedures have nesting depth 1. Other procedures have nesting depth 1 more than the nesting depth of the immediately outer procedure. In the example above main has nesting depth 1; both f and g have nesting depth 2.

7.3.5: Access Links


The AR for a nested procedure contains an access link that points to the AR of the (most recent activation of the immediately outer procedure). So in the example above the access link for all activations of f and g would point to the AR of the (only) activation of main. Then for a procedure P to access a name defined in the 3-outer scope, i.e., the unique outer scope whose nesting depth is 3 less than that of P, you follow the access links three times. The question is how are the access links maintained.

7.3.6: Manipulating Access Links


Let's assume there are no procedure parameters. We are also assuming that the entire program is compiled at once. For multiple files the main issues involve the linker, which is not covered in this course. I do cover it a little in the OS course. Without procedure parameters, the compiler knows the name of the called procedure and, since we are assuming the entire program is compiled at once, knows the nesting depth. Let the caller be procedure R (the last letter in caller) and let the called procedure be D. Let N(f) be the nesting depth of f. I did not like the presentation in 2e (which had three cases and I think did not cover the example above). I made up my own and noticed it is much closer to 1e (but makes clear the direct recursion case, which is explained in 2e). I

am surprised to see a regression from 1e to 2e, so make sure I have not missed something in the cases below. 1. N(D)>N(R). The only possibility is for D to be immediately declared inside R. Then when compiling the call from R to D it is easy to include code to have the access link of D point to the AR of R. 2. N(D)N(R). This includes the case D=R, i.e., a direct recursive call. For D to be in the scope of R, there must be another procedure P enclosing both D and R, with D immediately inside P, i.e., N(D)=N(P)+1 and N(R)=N(P)+1+k, with k0.
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. P() { D() {...} P1() { P2() { ... Pk() { R(){... D(); ...} } ... } } }

Our goal while creating the AR for D at the call from R is to set the access link to point to the AR for P. Note that this entire structure in the skeleton code shown is visible to the compiler. Thus, the current (at the time of the call) AR is the one for R and if we follow the access links k+1 times we get a pointer to the AR for P, which we can then place in the access link for the being-created AR for D. When k=0 we get the gcc code I showed before and also the case of direct recursion where D=R.

7.3.7: Access Links for Procedure Parameters


Basically skipped. The problem is that, if f calls g giving with a parameter of h (or a pointer to h in C-speak) and the g calls this parameter (i.e., calls h), g might not know the context of h. The solution is for f to pass to g the pair (h, the access link of h) instead of just passing h. Naturally, this is done by the compiler, the programmer is unaware of access links.

7.3.8: Displays

Basically skipped. In theory access links can form long chains (in practice nesting depth rarely exceeds a dozen or so). A display is an array in which entry i points to the most recent (highest on the stack) AR of depth i.

7.4: Heap Management


Almost all of this section is covered in the OS class.

7.4.1: The Memory Manager


Covered in OS.

7.4.2: The Memory Hierarchy of a Computer


Covered in Architecture.

7.4.3: Locality in Programs


Covered in OS.

7.4.4: Reducing (external) Fragmentation


Covered in OS.

7.4.5: Manual Deallocation Requests


Stack data is automatically deallocated when the defining procedure returns. What should we do with heap data explicated allocated with new/malloc? The manual method is to require that the programmer explicitly deallocate these data. Two problems arise. 1. Memory leaks. The programmer forgets to deallocate.
2. 3. 4. 5. loop allocate X use X forget to deallocate X

As this program continues to run it will require more and more storage even though is actual usage is not increasing significantly. 6. Dangling References. The programmer forgets that they did a deallocate.
7. allocate X 8. use X 9. deallocate X 10. 100,000 lines of code not using X

11.

use X

Both can be disastrous.

7.5: Introduction to Garbage Collection


The system detects data that cannot be accessed (no direct or indirect references exist) and deallocates the data automatically. Covered in programming languages??? Skipped

7.5.1: Design Goals for Garbage Collectors


Skipped

7.5.2: Reachability
Skipped.

7.5.3: Reference Counting Garbage Collectors


Skipped.

You might also like