Tower of Hanoi by Induction

22 Aug, 2023

The Tower of a Hanoi (likely no relation to its namesake city in Vietnam) is a puzzle game that involves three rods and a set of disks of different sizes (or values). At the beginning of the game the disks are all stacked in decreasing size order on one rod, with the largest at the base.

                        A                       B                       C
            
                        |                       |                       |
                        =                       |                       |
                       ===                      |                       |
                      =====                     |                       |
                     =======                    ┴                       ┴

The goal is to move the stack to one of the other rods under the following constraints.

Only one disk can be moved at a time.
Only the upper disk on a stack can be move to the top of another stack or to an empty rod.
A disk cannot be moved on top of a smaller disk.

A programming analog of the puzzle involves a sequence of ordered values (usually integers) representing the disks, moved between three stackable structures (e.g. linked list, array), representing the rods.

        A = [4, 3, 2, 1]
        B = []
        C = []

Looking for a solution to this problem online, the popular algorithm you'll find involves moving odd values clockwise and even values counter-clockwise, while obeying additional constraints that eventually lead the program to a solution.

In this article, we'll explore a different (and I think more interesting) approach to the solution, through induction. I later present an iterative adaptation of that recursive algorithm, which is also different from the popular iterative approach, as it uses a different insight.

The challenge

Devise a program that transfers all values from an origin stack to a destination stack, while respecting the constraints laid out earlier, at all times. In our program, a disk will be represented by an integer value, while a stack can be adapted from a linked list or an array, with the associated complexity for the push and pop operations.

        A = [v1, v2, v3, ..., vN]
        B = []
        C = []

We're trying to devise a function TOH() (short for "tower of hanoi") that can move N values from stack A (an origin stack) to stack B (a destination stack), aided only by stack C (a temporary stack) and within the established constraints explained earlier. The signature of such a function might look like this,

TOH (N, origin, destination, temporary) {
    ...
}

and although the function would not return anything per se, the end result of the operation would be that the top N values from origin would be moved to destination (provided no constraint was violated), stacked in their original order.

A `magic()` function

Let's presume that there already exists a function named magic(), that is capable of moving exactly N-1 values, while respecting the same constraints. Its signature would look something like this:

magic (origin, destination, temporary) {
    ...
}

If you were aided with this function, here would be the steps required to move all N values from A to B:

First, use magic() to move N-1 values (i.e. v2 up to and including vN) from A to C, using B as a temporary stack.

    magic(A, C, B)

    A = [v1]
    B = []
    C = [v2, v3, ..., vN]

Second, move the last value in A to B.

    value = pop(A)
    push(B, value)

    A = []
    B = [v1]
    C = [v2, v3, ..., vN]

Finally, use magic() again to move the N-1 values previously placed in C over to B, this time using A as the temporary stack.

    magic(C, B, A)

    A = []
    B = [v1, v2, v3, ..., vN]
    C = []

The details of how magic() handles the N-1 subproblem within the constraints are not important for now. We just delegate the subproblem and presume that it somehow gets solved. Substituting A, B, and C above with their generic counterparts, our TOH() function can be formalized as follows:

TOH (N, origin, destination, temporary) {
    magic(origin, temporary, destination)
    ...
}

Note in these steps, that the temporary stack in the TOH(N) problem is first used as the destination stack in the initial call to magic(), while the destination stack is used as the temporary stack.

Next, assuming that the magic() operation succeeded, we can now move the last and largest value in our origin stack to the destination stack, since it's currently empty.

TOH (N, origin, destination, temporary) {
    magic(origin, temporary, destination)
    value = pop(origin)
    push(destination, value)
    ...
}

Finally, let's move all values currently in our temporary stack on top of the one we just placed in the destination. This time though, the temporary and origin stacks in TOH(N) serve respectively as the origin and temporary holders in the second magic() call.

TOH (N, origin, destination, temporary) {
    magic(origin, temporary, destination)
    value = pop(origin)
    push(destination, value)
    magic(temporary, destination, origin)
}

Why it works

Whatever magic() does to move N-1 values, we know that as far as TOH(N) is concerned, its constraints will be respected. When we first call magic(), the last value in the origin stack is larger than any value being moved within magic() and will remain untouched over the course of that operation. At the onset of the second magic() call, the destination now contains the largest possible value (which will remain untouched by magic()), while the origin stack, which will now serve as a temporary space for magic(), is empty. Again, while magic() operates within its own little bubble, as far as TOH(N) is concerned, the constraints are being respected.

Sorcery (aka induction)

So, we've solved TOH(N) with the help of some mysterious magic() function. But come to think of it, if TOH() can now move some arbitrary N number of values, it should logically also be able to move an equally arbitrary N-1 number of values by itself, which is the work currently being done by magic(). Furthermore the two functions have nearly the same signature. If we call TOH() and pass N-1 as the number of items to move, they are equivalent. Let's do some alchemy and replace magic() by TOH(), and see what happens.

TOH (N, origin, destination, temporary) {
    TOH(N-1, origin, temporary, destination)
    value = pop(origin)
    push(destination, value)
    TOH(N-1, temporary, destination, origin)
}

Something from nothing! We have a recursive algorithm. Now we just need a way to limit the recursion.

If we observe how the calls recurse, we can see that the value passed each time as N to TOH() will eventually be less than zero, which is impractical, since N should indicate the number of values to move. We need to provide a condition for the recursion to stop (aka a base case). Let's observe what happens as N approaches zero.

For instance, what would be necessary to successfully move a stack of only one value? From looking at the algorithm above, we can see that the two lines between the two recursive calls would be enough to solve the case N=1:

    value = pop(origin)
    push(destination, value)

Therefore, when solving for N=1, we just need to ensure that the auxiliary recursive calls to solve for N=0 do nothing (i.e. they simply return). Easy enough.

TOH(N, origin, destination, temporary):
    if N==0:
        return
    TOH(N-1, origin, temporary, destination)
    value = pop(origin)
    push(destination, value)
    TOH(N-1, temporary, destination, origin)

We have a solution for the N=1 problem. Do we have one for N=2? Since the function can delegate the N=1 subproblem to a recursive call, it can also solve N=2. What about N=3? Well, it delegates N=2 to a recursive call, and it can solve N=3. And N=4, N=5, ..., N=K? To solve the N=K problem, it delegates the N=K-1 subproblem to a recursive call, which delegates N=K-2, ..., which delegates N=2, which delegates N=1 and eventually it solves N=K.

Complexity analysis

Observe that each N problems requires to solve the N-1 problem twice.

N   ->  Time(N)
0   ->  1
1   ->  1 + 2 * Time(0) = 3
2   ->  1 + 2 * Time(1) = 7
3   ->  1 + 2 * Time(2) = 15
4   ->  1 + 2 * Time(3) = 31
...
K   ->  1 + 2 * Time(K-1) = 2^(K+1) - 1

The time complexity is therefore exponential $O (2^{N})$ .

The algorithm reuses the same stacks for each recursive call, which doesn't require more space than the original number of values to move. The auxiliary space complexity is thus O(1).

Iterative implementation

Before we delve into this part, it should be noted that the above solution to the Tower of Hanoi problem is clearly based on inductive reasoning, not on the usual play of rules (odd clockwise, even counter-clockwise, etc). One could derive the former without ever knowing of the latter.

It's often a good idea to approach the iterative reimplementation of an algorithm by recognizing a recursive pattern for which there is a known iterative equivalent and simply apply that implementation. The recursive implementation of TOH() has one such interesting pattern.

TOH(N, A, B, C):
    if base_case(N, A, B, C):
        return
    TOH(N-1, A, C, B)
    process(N, A, B, C)
    TOH(N-1, C, B, A)

If you're familiar with binary tree traversals, you may recognize the above as being similar to the "in-order-traversal" algorithm, which looks as follows:

IOT(node):
    if base_case(node):
        return
    IOT(left(node))
    process(node)
    IOT(right(node))

It turns out that there's an iterative implementation of "in-order-traversal", that goes as follows.

IOT(node):
    stack = [node]
    next = left(node)
    while not_empty(stack) or not_nil(next):
        if not_nil(next):
            push(stack, next)
            next = left(next)
        else:
            node = pop(stack)
            process(node)
            next = right(node)

But what would be the equivalent of a "node" in TOH()? What would be its "left child"? And its "right child"? Comparing the elements in the recursive IOT() to those in TOH(), we can observe that the "node" equivalent is the tuple N, A, B, C. That is, the node is made up of the specific roles assigned to each of A, B, and C (i.e. origin, destination, temp), as well as the value of N, in the span of a call. The left child is determined from the mapping (N, A, B, C) -> (N-1, A, C, B), while the right child is similarly given by the mapping (N, A, B, C) -> (N-1, C, B, A).

Since, the value of N decreases along the parent->child relationship, that's where the base case ought to be tested and handled.

Our iterative implementation of TOH() thus goes as follows:

TOH(A, B, C):
    stack = []
    N = sizeof(A)
    next = (N, A, B, C)
    while not_empty(stack) or not_nil(next):
        if not_nil(next):
            push(stack, next)
            next = left(next)
        else:
            node = pop(stack)
            move_value(node)
            next = right(node)

left(node):
    (N, A, B, C) = node
    next_N = N - 1
    if next_N==0: // base case
        return nil
    return (next_N, A, C, B)

right(node):
    (N, A, B, C) = node
    next_N = N - 1
    if next_N==0: // base case
        return nil
    return (next_N, C, B, A)

move_value(node):
    (N, A, B, C) = node
    value = pop(A)
    push(B, value)

The time complexity is the same as the recursive approach since this is an iterative emulation of the traversal. The algorithm requires a stack to track the next movements to process in the traversal tree. There are $2^{N}$ nodes in the tree. The maximum number of moves tracked is equivalent to the height of the traversal tree, going from root to leaf. So at any given time, at most $L o g (2^{N})$ or $N$ nodes. So this algorithm, although an interesting solution, still is costlier in auxiliary space than the common iterative solution.