• ## 30. Fun with MI in C++

The only features originally allowed into C++ were those which had efficient implementations. This sounds great, but the results were sometimes dubious. Consider the following program which uses multiple inheritance.

#include <cstdio>
#include <inttypes.h>

using namespace std;

class Left {
int32_t a, b;
};

class Right {
int32_t c, d;
};

class Bottom : public Left, public Right {
};

int main(int argc, char *argv[]) {
Bottom *bottom = new Bottom();
Left *left = bottom;
Right *right = bottom;

printf("left   -> %p\n", left);
printf("bottom -> %p\n", bottom);
printf("right  -> %p\n", right);

if (left == bottom && bottom == right) {
printf("left == bottom == right\n");
} else {
printf("!(left == bottom == right)\n");
}

return 0;
}

The first three lines of output are:

left   -> 0x24dd010
bottom -> 0x24dd010
right  -> 0x24dd018

Right off the bat, we see that something happened to right. The statement right = bottom did something special: instead of setting right to bottom, it actually set it to bottom + 8. To understand why, we need to look at the memory representation for *bottom.

0x24dd006 ->  |    ...    |
0x24dd010 ->  | int32_t a |   <- bottom, left
0x24dd014 ->  | int32_t b |
0x24dd018 ->  | int32_t c |   <- right
0x24dd022 ->  | int32_t d |
0x24dd026 ->  |    ...    |

A Bottom object consists of a Left object, followed by a Right object. In turn, Left and Right objects consist of their own fields. So, a Bottom consists of four int32_ts.

One of the principles behind C++ was that access to object fields should be as fast as possible. In our case, it’s possible to avoid any indirection and make it as fast as normal variable access: since the compiler knows the memory layout, it replaces accesses to bottom->a by *bottom, accesses to bottom->b by *(bottom + 4), and so on. This works fine for bottom, which actually points to a Bottom object, and for left because the first part of a Bottom object is a Left object.

What about right? We want right->d to be replaceable by *(right+4), but it’s the second part of a Bottom object that is a Right object. For this to work, the statement right = bottom cannot simply assign bottom to right; instead, it casts Bottom* to Right* by assigning bottom + <offset of Right in Bottom> to right.

Now that we understand what’s happening, we can ask the next question: what is the last line of output? Is it left == bottom == right? Or !(left == bottom == right)? It’s the former: the compiler somehow keeps track of what the pointers are supposed to be, and tries to hide the offsetting from the programmer.

I generally like C++, but this gives me mixed feelings. A part of me thinks that this is awesome, and another part screams in horror.

This article is a very good description of C++’s multiple and virtual inheritance with a focus on memory layouts.

• ## 29. Escape Analysis in Go

Returning a pointer to a local variable is legal in Go. As a C programmer, the following looks like an error to me, but it’s perfectly alright in Go.

func NewFile(fd int, name string) *File {
f := File{fd, name, nil, 0}
return &f
}

Above we have a function that defines the local variable f as a File struct, and then returns the address of that local variable. If this were C, the memory layout would look like this:

    The Stack

|      ...      |
|               |
+---------------+
|               |  \
|               |   } Stack frame of calling function
|               |  /
+---------------+
|               |  \
|  f : File     |   } Stack frame of NewFile
|               |  /
+---------------+

If this were C, when we return from NewFile, its stack frame disappears along with the f defined in it. But this isn’t C. To quote Effective Go,

Note that, unlike in C, it’s perfectly OK to return the address of a local variable; the storage associated with the variable survives after the function returns.

This begs the question, how? How could the storage of local variables survive function returns? The FAQ explains:

When possible, the Go compilers will allocate variables that are local to a function in that function’s stack frame. However, if the compiler cannot prove that the variable is not referenced after the function returns, then the compiler must allocate the variable on the garbage-collected heap to avoid dangling pointer errors. In the current compilers, if a variable has its address taken, that variable is a candidate for allocation on the heap. However, a basic escape analysis recognizes some cases when such variables will not live past the return from the function and can reside on the stack.

In other words, the Go compiler may allocate all local variables, whose address is taken, on the garbage-collected heap (it actually did this until late 2011); however, it’s clever enough to recognize some cases when it’s safe to allocate them on the stack instead.

The code that does the “escape analysis” lives in src/cmd/gc/esc.c. Conceptually, it tries to determine if a local variable escapes the current scope; the only two cases where this happens are when a variable’s address is returned, and when its address is assigned to a variable in an outer scope. If a variable escapes, it has to be allocated on the heap; otherwise, it’s safe to put it on the stack.

Interestingly, this applies to new(T) allocations as well. If they don’t escape, they’ll end up being allocated on the stack. Here’s an example to clarify matters:

var intPointerGlobal *int = nil

func Foo() *int {
anInt0 := 0
anInt1 := new(int)

anInt2 := 42
intPointerGlobal = &anInt2

anInt3 := 5

return &anInt3
}

Above, anInt0 and anInt1 do not escape, so they are allocated on the stack; anInt2 and anInt3 escape, and are allocated on the heap.