ICS 142 Winter 2004, Final Exam Study Guide

The study guide

First of all, what about JFlex and CUP?

I would expect you to understand the basic principles behind these tools, though I won't ask you to write either a JFlex or a CUP script. However, I might ask you to evaluate simple ones, to ensure that you understand how they work.

Semantic analysis and abstract syntax trees

These topics are explained in quite a bit of detail already in the Assignment #3 write-up; I'm not sure there's much worthwhile that I can add.

Symbol tables

Much of a compiler's ability to understand the meaning of a source program revolves around its ability to know the meaning of the identifiers that appear in it.
To manage the problem of tracking the meaning of identifiers, compilers use a symbol table.
- A symbol table is a mapping between identifiers and their meanings.
- The most important information that is maintained in a symbol table in early stages of compilation is type information, which facilitates much of static semantic checking.
- As compilation proceeds, additional information may be added, such as statically-determined memory addresses or offsets into activation records.
Of course, in most programming languages, the meaning of a particular identifier may vary throughout a program.
- A language's scoping rules explain how the meaning of names is tied to declarations or uses elsewhere in the program.
- Almost all commonly-used programming languages employ static scoping rules, meaning that the use of an identifier can always be mapped back to its declaration unambiguously at compile time.
- The most common kind of static scoping rules enforce a kind of block structure on the program.
  - Declarations within one block (say, a procedure, or a block statement within a procedure) only take effect within that block.
  - Declarations in inner scopes (e.g. block statements within block statements) take precedence over declarations in outer ones.
- A common implementation technique for a symbol table in such a case is a scoped symbol table.
  - A scoped symbol table is, essentially, a stack of symbol tables.
  - Each lookup in a scoped symbol table involves looking up the name in each symbol table on the stack, starting with the top and proceeding until either the name is found or until all tables have been consulted.
  - If the name is not found in any of the tables in the stack, an error is reported.
- This approach lends itself very nicely to analysis during parsing in a language in which all names must be declared before use.
  - Entering a new scope causes a new symbol table to be pushed on to the stack.
  - Exiting a scope causing a symbol table to be popped from the stack.
- Languages with more complex scoping rules (e.g. Java, which includes lookups throughout inheritance hierarchies) will require a more complex symbol table implementation.

Interpreters vs. compilers

Once the scanner, parser, and semantic checker have determined a program to be lexically, syntactically, and (static) semantically correct, we've reached a fork in the road.
- From here, we could continue with intermediate code generation, optimization, and code generation, if we're interested in building a compiler.
- We can also continue in a different direction, instead building a simulator that executes the program directly. Such a simulator is generally called an interpreter.
Compilers and interpreters share some techniques: most notably, the techniques that are used in the front end. Either way, the source program must be scanned, parsed, and its meaning must be understood.
- The difference is what is done with that understood meaning: a compiler translates it to an equivalent program in a target language, while an interpreter executes the program and generates output.

Source-level interpreters using abstract syntax trees

A source-level interpreter is one that interprets the source code somewhat directly, rather than first translating it to an intermediate form (such as how Java is first compiled to bytecodes that are then interpreted).
- To build a source-level interpreter, some representation of the source code will need to be available to the interpreter.
- A common solution for this problem is to use an abstract syntax tree, traversing it and evaluating it as the program executes.
In theory, any language can be interpreted.
- Of course, interpretation represents a tradeoff: flexibility for speed.
  - An interpreter is capable of reporting errors in terms of the original source code, which is more difficult in a compiled program.
  - Interpretation allows programs to be executed piecemeal for testing purposes, without a lengthy compilation step.
  - Interpretation also makes it easier to support features such as rewriting portions of a program while executing it in a debugger.
  - Unfortunately, interpretation is orders of magnitude slower than the execution of a compiled program.
- For some languages, the tradeoff is not as great as others.
  - Languages with highly dynamic characteristics, such as Scheme (with dynamic typing) or early versions of LISP (with not only dynamic typing, but also dynamic scoping) pay less of a penalty when interpreted relative to execution of a compiled version, since so much housekeeping work (maintenance of symbol tables, tracking the types of symbols) needs to be done at run time, whether the program is interpreted or compiled.

Linear intermediate representations

The result of scanning and parsing is often the creation of an intermediate representation (IR) of the source program.
- An abstract syntax tree (AST) is one form of IR, one that graphically (i.e. in terms of an object graph) represents the source program.
- A typical AST represents source program abstractions, such as if statements and while loops, directly, though it is possible to build a lower-level IR.
If we want to ultimately generate assembly code equivalent to the source program, then it is often necessary to work with a lower-level IR, one that is closer to the level of abstraction provided by an assembly language.
- Typically, this means that the IR will be linear (i.e. a list of instructions), so that it will more closely resemble assembly language.
There are three benefits to a linear IR:
- It's closer to the assembly code that we will eventually generate, which simplifies the instruction-selecting portion of code generation somewhat.
- Since linear IR's are often more compact than graphical ones such as AST's, it is faster to process them.
- Most importantly, opportunities for optimization may be found in a linear IR that simply don't show up in an AST built directly from source code, since a well-designed linear IR conveys more information than the source code does.
One form of linear IR is stack machine code.
- Stack-machine code assumes the presence of an operand stack, which holds the results of intermediate calculations.
- Most operations take their operands from the stack, then push their result back on to the stack.
  - For example, an integer add would take the top two elements from the stack, add them, then push the result.
- Often included is a swap operation, which swaps the top two stack elements.
- There is also generally a representation for memory, which is used to store values for longer terms.
- Advantages: Simple to generate; a post-order traversal of an AST, for example, can be used to generate it. Very compact, since it doesn't need to store operands in instructions.
- Disadvantage: It is not a good IR unless you're targeting a stack machine (i.e. a processor that actually uses an operand stack).
In a compiler that targets a more conventional processor, a more appropriate representation is three-address code.
- In three-address code, most operations look roughly like this: x = y op z, where op is some operator, y and z are source operands, and x is the target operand. The target operand may be different from the source operands.
- This is a convenient representation, because it is easy to represent in a compiler as an object with four values stored in it (an opcode, along with the three operands).
- It also matches up nicely with many "real" instruction sets. (One notable exception is Intel 80x86, which has destructive two-address instructions.)
- An example of a three-address code that we used this quarter is ILOC.

Do you really expect me to have memorized all the ILOC instructions?

You should be familiar enough with the ILOC instructions to...
- ...understand what a chunk of ILOC does, and also understand trade-offs in generating different blocks of ILOC for control-flow structures and expressions, as we discussed in lecture.
- ...you should be able to apply optimizations similar to the ones you implemented in Assignment #6 (or perhaps others, given descriptions of them).
The thing to remember about ILOC instructions is that they follow a regular naming convention that makes them easy to understand, even if you haven't got all the instructions memorized.
- As always, understanding trumps memorization here.
- Know the difference between regular instructions, "I" instructions (immediate), "AI" instructions (address + immediate), and "AO" instructions (address + offset).
- Know, in general, what instruction groups like "load" and "store", "add", "lshift", etc., do.
- In many cases, you can infer meaning based on the name and prior understanding of assembly-style languages from prior courses.
The best place to get a description of all the instructions is in Appendix A of the textbook. Most of the instructions are listed in a table in the Assignment #6 write-up, though I left some out that I didn't intend for us to use in the assignment. (Notably missing from the assignment, for example, are the "comp" and "cbr_*" instructions.)

Run-time organization

A compiler's job is very similar to the job of a programmer.
- A programmer maps the abstractions of some problem to the abstractions of the source language. For example, in building a Java program to manage student enrollment information, classes such as Student, Course, and Enrollment might be created.
- A compiler maps the abstractions provided by a source language into the abstractions provided by a target language. This generally means taking higher-level abstractions like classes, procedures, and expressions and turning them into lower-level ones such as instructions, memory usage, and the management of a run-time stack.
- There are a number of standard techniques for structuring high-level language abstractions in assembly-level target programs that are widely used.
Execution of a program begins when the operating system invokes it, either on behalf of a user or for its own ends.
- Program invocation involves four major steps:
  - OS allocates space for the program in memory. (This usually means allocating a logical address space unique to that program, then mapping it into virtual memory.
  - OS loads the code for the program into the logical address space.
  - OS executes initialization code, if any (e.g. initializations of global variables, such as constructor calls on global objects in a C++ program).
  - OS jumps to the entry point of the program (e.g. "main").
The logical address space is typically organized with the code at one end of the address space, and all the data organized after the code.
- In an implementation of a typical programming language, the data area is split into separate portions:
  - The global area, in which global variables are allocated at compile-time.
  - The run-time stack, in which information about each procedure call is kept, in the form of a stack of activation records.
  - The heap (or free store), in which dynamically-allocated variables with indeterminate lifetimes are stored.
- Qualities of the programming language being implemented will determine whether this design makes sense.
  - For example, in Scheme, where activations of procedures are not necessarily last-in/first-out, activation records must be allocated in the heap, rather than stored on the run-time stack.
  - In languages that do not support recursion, activation records can be allocated statically, instead.
Perhaps the most important construct that we'll need to manage at run time is that of the subprogram.
- The problem is that processors do not support the high-level concept of a subprogram directly. So it must be simulated with low-level instructions instead.
An activation of a subprogram is a complete use of it (i.e. a call to it, its execution to completion, and the return from it).
In order for subprograms to communicate with each other, while knowing as little as possible about one another, an agreed-upon convention must be followed by both the caller and the callee.
- This convention, in total, is called subprogram linkage.
- It lists, among other things, responsibilities for callers and callees.
  - For example, the caller must pass parameters to the callee in a previously agreed-upon way, so that the callee can find them.
  - The caller and callee, together, must agree on who will save the processor's state (i.e. registers) so that it can be restored upon return.
- We'd like code written in different languages to be able to communicate with one another, if possible...
  - ...and we'd even like code written in the same language, but compiled with different compilers for the same platform, to be able to communicate.
  - For this reason, subprogram linkage is often agreed upon at operating system design time.
In order to separate information about activations from one another, information about each subprogram is stored in an activation record (AR).
- The contents of AR's will vary, but a typical one might look something like this:
  - local data area
    - includes local variables and compiler-generated temporaries
    - may be of variable size, since number of temporaries may vary per execution of a subprogram, depending on control-flow
  - caller's AR pointer (dynamic link)
  - static link (pointer to AR of static ancestor)
  - return address
  - return value pointer
  - register-save area
  - parameters
    - in a language that supports the passing of variable numbers of parameters, this may also be of variable size
    - in such a language, it is typical to pass parameters in backward order, with the first one at the top and later ones below it, which allows the first n arguments to be at known offsets
Implementation of parameter passing modes:
- pass by value and pass by value-result. Copy the value of the actual parameter into a slot in the activation record. All uses of the formal parameter access (or modify) the copy.
- pass by reference. Pass a pointer to the actual parameter instead of the actual parameter itself. Store that pointer in callee's AR. Whenever formal parameter is accessed, pointer is dereferenced.
Returning values is slightly more complicated, since the returned value must live on after the callee dies (along with its AR).
- One strategy is to store a return value pointer in the callee's AR, which points to an allocated slot in the caller's AR (perhaps in its local data area).
- This strategy doesn't work if the caller is unaware of how large the return value may be. In this case, a pointer to a heap-allocated return value should be returned instead.
Accessing variables:
- Accessing global variables is straightforward. They can be accessed directly at their statically-determined address.
- Local variables are also straightforward to access. Assuming that each is laid out at a known offset in its AR, it can be accessed at that known offset, assuming some register points to the current AR.
- Non-local variables are trickier.
  - Assuming a block-structured language, each AR can maintain a static link, which points to the AR for the most recent activation of the lexically-enclosing subprogram.
  - Any of use a nonlocal variable can be denoted by a static-distance coordinate, which is explained in more detail in the Assignment #5 write-up.
  - Maintaining the static links at run-time can be done with a simple algorithm, assuming that each subprogram has a level number that indicates its depth (i.e. how many subprograms it is nested within):
    - A call to a level 0 subprogram results in the callee's static link pointing to null.
    - A call from level n to level n + 1 results in the callee's static link pointing to the caller's AR.
    - A call from level n to level n results in the callee's static link pointing where the caller's static link points.
    - A call from level m to level n (where n < m), the callee's static link points to where the static link n - m static links away points.

Intermediate code generation

When implementing high-level constructs with low-level intermediate instructions, a compiler has many options.
- Some strategies will yield code that runs faster. Some will yield code that uses less memory or less power.
Many of the trade-offs that must be made in selecting an implementation strategy for a high-level construct revolve around the memory hierarchy.
- From a simplistic point of view, there are four places where information is typically stored:
  - Registers. Fast but expensive, so there typically aren't very many of them. Can usually be accessed in one cycle.
  - Cache. Nearly as fast as registers, but cheaper so that there can be more. Can usually be accessed in a few cycles.
  - Main memory. Significantly slower than cache, but significantly cheaper. At least tens to hundreds of times slower than registers, a disparity that's getting progressively worse as processors are speeding up more rapidly than memory.
  - Disk. Millions of times slower than registers, but extremely cheap.
The most obvious way to speed a program up is to keep data in registers whenever possible.
- But, of course, there are a finite number of registers available.
- In intermediate code, we sidestep this problem by using an unlimited number of virtual registers...
  - ...and counting on a register allocator to map the set of virtual registers to a set of actual registers toward the end of compilation.
- Even with an infinite number of registers, not all data can be stored in registers.
  - If more than one name can be used to access a variable, it can't be stored in a register, unless the compiler is fully aware of all the possible names for it in every context.
  - When multiple names can be used and procedure boundaries are crossed (e.g. when pass by reference parameter passing is used), the value must be stored in memory before the call, and loaded afterward.
  - In fact, as soon as the address of the variable is taken, it becomes significantly more difficult to keep a value in a register.
  - And in a language like C, with arbitrary pointer arithmetic, the difficulty level is even higher, since pointers can theoretically point anywhere and treat any value as being of any data type.
- An additional compilation that affects whether (and how) values might be stored in registers is the presence of multiple threads.
Generating intermediate code for the arithmetic expression a + b requires three steps:
- First, generate code to load the value of a into a register (if it's not in a register already).
- Next, generate code to load the value of b into a register (if it's not already).
- Finally, generate an add instruction.
- The resulting ILOC sequence might look something like this, assuming a and b are local variables stored in the current activation record:
  - loadI @a ⇒ r₁
  - loadAO r_arp, r₁ ⇒ r_a
  - loadI @b ⇒ r₂
  - loadAO r_arp, r₂ ⇒ r_b
  - add r_a, r_b ⇒ r_a+b
- A naive approach for generating code for expressions, then, is a postorder tree traversal.
  - We keep track of which variables are stored in registers as we go along.
  - We load values into registers only when necessary (i.e. only when we need them and they're not loaded already).
  - We generate code for the operators only after the operands are loaded.
- The main problem with this approach is that it may not minimize register usage:
  - If the left-hand side requires more registers in order to perform its computation than the right-hand side does...
  - ...we'd be better off doing the right-hand side of the computation first.
  - So the best move is to first traverse the tree and decide which subtree will require the most registers, and generate code for that subtree first.
- The presence of function calls in expressions complicates matters somewhat, since functions may have side effects.
  - It will be necessary to store values in memory before the call and load them afterward, if it's possible for the function to change them.
A separate document titled ILOC Examples demonstrates and explains intermediate code generation for boolean expressions and loops, as we discussed them in lecture.

Optimization

An optimizer takes as input a set of intermediate code. Its output is a "better" set of intermediate code, one that is equivalent in effect but does the job in a better way (i.e. faster, with less memory usage, etc.).
The primary job of an optimizer is to understand the details of some portion of the intermediate code program. The scope of an optimization is the portion of the program that is analyzed.
- The smallest scope is typically one basic block. A basic block is a straight-line list of instructions, with a label appearing (optionally) on the first instruction and a jump or conditional branch (optionally) appearing as the last instruction.
  - In other words, all of the instructions of a basic block are guaranteed to execute completely and in sequence from beginning to end.
The first step for many optimizations is to build a control-flow graph (CFG).
- A CFG captures the control flow between basic blocks. Individual nodes of the graph represent basic blocks. Edges indicate possible flows of control between them.
- I provided the code for this in Assignment #6. Here's (roughly) the algorithm to find all the leaders (i.e. the instructions that begin basic blocks).
  - For each instruction i:
    - If i has a label on it, it's a leader.
    - If i follows an instruction that is a conditional branch of a jump, it's a leader.
- Once all the leaders have been found, nodes can be built to represent each basic block, and edges can be added based on the labels appearing on leaders and in jumps/branches.
There are a variety of optimizations that can be performed. As a starting point, be sure you understand the ones that were implemented in Assignment #6.
Local optimizations can be extended to become superlocal optimizations. A superlocal optimization operates on an extended basic block (EBB), rather than one basic block.
- An EBB is a set of basic blocks B₁, B₂, ..., B_n, in which B₁ may have more than one predecessor in the CFG, but all other blocks will have a unique predecessor in the EBB.
  - In other words, the basic blocks in an EBB are a subset of the nodes in the CFG that are, themselves, a tree.
- Since an EBB is, essentially, a tree of basic blocks, optimizations such as constant propagation and folding can easily be extended to work on EBB's instead of basic blocks.
- Since a unique predecessor is guaranteed to execute before each node in the EBB, optimizations can be done in a treelike fashion, with information carried down from one node to another along each path in the tree.
- Complications in optimization typically arrive at join points in the CFG (i.e. nodes with multiple predecessors). Superlocal optimizations deal with such nodes as though they have no predecessors, i.e. discarding any known information before beginning to process them.

Instruction scheduling

This topic will not be covered on the exam.