Monday, November 2, 2009

SICP - Notes from Lecture 1B

Procedures and Processes:
Substitution Model
Covers Text Section 1.2

You can download the video lecture or stream it on MIT's OpenCourseWare site. It's also available on YouTube.

This lecture is presented by MIT’s Gerald Jay Sussman.

Part 1

This lecture concentrates on understanding how patterns of procedures and expressions cause particular behaviors and patterns of execution. In particular, we’re going to look at how different patterns of writing procedures can be used to write iterative and recursive processes. This was touched on in the previous lecture, but will be explored in much more depth here.

The substitution model is illustrated using the following definition for computing the sum of squares:
(define (sos x y)
(+ (sq x) (sq y)))

(define (sq x)
(* x x))

Substitution Rule

To evaluate an application:
  • Evaluate the operator to get procedure
  • Evaluate the operands to get arguments
  • Apply the procedure to the arguments
  • Copy the body of the procedure substituting the arguments supplied for the formal parameters of the procedure.
  • Evaluate the resulting new body.

When evaluating the operands, it doesn't really matter what order you do it in.

Key Quotes:
The key to Understanding complicated things is to know what not to look at, and what not to compute, and what not to think.

One of the things we have to learn how to do is ignore details.

Using the substitution rule (which is stressed to be an inaccurate model that we will refine later) and ignoring as much irrelevant detail as possible, we can show how the sum of squares is calculated.

(sos 3 4)
(+ (sq 3) (sq 4))
(+ (sq 3) (* 4 4))
(+ (sq 3) 16)
(+ (* 3 3) 16)
(+ 9 16)
25

In the first step we evaluate the sos operator to get its procedure. We substitute the values of the parameters into the definition. We can treat the + operator as primitive, so there’s no need to evaluate it. Next we need to evaluate the two sq operators. It doesn’t matter which order we evaluate them in. Each of these evaluations follows the same rule. We evaluate the operators to get the sq procedures, substituting the values of the arguments. At each step, we evaluate an expression as soon as it is reduced to primitives. Remember from the last lecture that this is called applicative-order evaluation. (This is in contrast to normal-order evaluation, where we would do the substitution of the expressions for the operands inside the body first. "Fully expand and then reduce.")


The IF Expression – A special form

One exception to our evaluation rules is the IF expression.

To evaluate an IF expression
  • Evaluate the predicate expression
  • If it yields TRUE
  • evaluate the consequent expression
  • otherwise
  • evaluate the alternative expression

The key here is that only one of the consequent or alternative expressions will ever be evaluated. This is different from the normal evaluation rule, which is why conditionals are termed a special form.

Peano Arithmetic (part 1)

The following procedure is given as an example of one way that addition could be defined in Scheme.
(define (+ x y)
(if (= x 0)
y
(+ (dec x) (inc y))))
Note: In the lecture, Prof. Sussman uses the 1+ and -1+ operators for increment and decrement. For readability, I've replaced them with inc and dec.

In this example we are counting down x, while counting up y. The real-world analogy given in the lecture is to imagine that you have two piles of stones, one with x number of stones, the other with y stones. With this definition for the + operator, we're just taking stones from the x pile and placing them in the y pile one at a time. When the size of the x pile reaches 0, the y pile has all the stones and we return that as the answer.


Part 2


During the first few minutes of this part of the lecture Professor Sussman takes a few moments to explain the idea of test strips in photography. He explains that a good photographer needs to be able to visualize how a photograph will look before they take a photo. In order to develop this ability, a photographer will take a series of photos under slightly different conditions in order to see what different effects are produced by each small change.

This introduces the concept of simple perturbational analysis in programming. The idea is basically the same as a photographer’s test strips. We make slight changes to a program and observe the change in output from before and after the change. By this method we can learn what kinds of program structure produce what kinds of effects in our processes.

Peano Arithmetic (part 2)

The following alternative procedure for addition is given:
(define (+ x y)
(if (= x 0)
y
(inc (+ (dec x) y))))

This alternative definition for the + operator is the same in all respects but one. In the last line of the procedure, all increment operations are deferred until all of the decrement operations have been completed.

In this example, you still remove the stones from the x pile one at a time. But instead of removing one stone from the x pile and immediately transferring it to the y pile, you hold each x stone in a buffer. Once all the stones are removed from the x pile, you then add them (again, one at a time) into the y pile. Incrementing the y pile is deferred until all decrements of the x pile are complete.

These two processes for addition have very different behaviors.

The complexities for the first procedure are

time = O(x)
space = O(1)

This is called an iterative process. The process will complete in an amount of time proportional to the size of x, and a constant amount of space is used.

The complexities for the second procedure are

time = O(x)
space = O(x)

This kind of process is called a linear recursion (linear because it is proportional to x in both time and space).

This brings up the difference between a recursive procedure and a recursive process. Both of the procedures for Peano Arithmetic that we've looked at have been defined recursively. A recursive procedure is simply one that is defined in terms of itself. A recursive process is one that defers some operation until a later stage of the process.

An iterative process carries all of its state in explicit variables. A recursive process keeps some of the state information as a deferred operation.


Part 3

The next part of the lecture starts with a recursive definition that most everyone will be familiar with, a procedure for generating Fibonacci numbers. You may remember that the Fibonacci sequence is generated by starting with 0 and 1, then the next number in the sequence is the sum of the previous two.

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, …

Mathematically, you can compute the nth Fibonacci number by summing the (n-1)th and (n-2)th Fibonacci numbers. When n is 0 or 1, the nth Fibonacci number is simply n. This leads directly to the following recursive definition for the nth Fibonacci term:
(define (fib n)
(if (< n 2)
n
(+ (fib (- n 1))
(fib (- n 2)))))
Can you come up with an iterative process to compute the same value? See section 1.2.2 Tree Recursion of the text for the iterative solution.

Note that the fib procedure above calls itself not once, but twice. This produces a different process structure than the recursive structure we’ve seen before. This pattern is called tree recursion, because for every call to fib, it branches into two more calls, which themselves branch into two more calls each, and so on until you reach the leaf nodes of the tree.



The complexities for this procedure are

time = O(2n)
space = O(n)

The time complexity for the Fibonacci procedure is exponential because for each increment you add to n, you very nearly double the size of the graph that is produced in the execution of the procedure. You can see in the graph above (taken from the text) that the entire subtree for calculating fib(4) would be repeated again if you were to expand the graph to compute fib(6).

The space complexity is only linear because as n increases by one, the height of the tree (which represents the number of deferred operations that need to stay in memory) also only increases by one.

Key Quote:
The way you construct a recursive process is by wishful thinking.

The lecture concludes with a demonstration of the Towers of Hanoi problem. Most of the time in this section is spent explaining the problem and manually demonstrating a solution with a real set of wooden pegs and disks. I won’t try to reproduce that here. It’s worth watching the lecture to see how a recursive solution is arrived at by “wishful thinking.” That is, Prof. Sussman builds a solution for moving n disks by assuming you can move (n – 1) disks, then recursively applying the procedure down to the point where only 1 disk needs to be moved.

The recursive solution is built on the assumption that a procedure to solve the problem already exists, but only for smaller instances of the problem. Sussman then decomposes the problem until he reaches the point where a solution does exist, moving one disk. This is the general solution for solving problems recursively.

2 comments:

Anonymous said...

Thanks for explaining the details. I had missed that the definition (+ a b) was being called each time.

Anonymous said...

Your summaries of these lectures are so clear and concise, and they really make understanding the material easier. Thanks!