scvalex.net

30. Fun with MI in C++

The only features originally allowed into C++ were those which had efficient implementations. This sounds great, but the results were sometimes dubious. Consider the following program which uses multiple inheritance.

#include <cstdio>
#include <inttypes.h>

using namespace std;

class Left {
    int32_t a, b;
};

class Right {
    int32_t c, d;
};

class Bottom : public Left, public Right {
};

int main(int argc, char *argv[]) {
    Bottom *bottom = new Bottom();
    Left *left = bottom;
    Right *right = bottom;

    printf("left   -> %p\n", left);
    printf("bottom -> %p\n", bottom);
    printf("right  -> %p\n", right);

    if (left == bottom && bottom == right) {
        printf("left == bottom == right\n");
    } else {
        printf("!(left == bottom == right)\n");
    }

    return 0;
}

The first three lines of output are:

left   -> 0x24dd010
bottom -> 0x24dd010
right  -> 0x24dd018

Right off the bat, we see that something happened to right. The statement right = bottom did something special: instead of setting right to bottom, it actually set it to bottom + 8. To understand why, we need to look at the memory representation for *bottom.

0x24dd006 ->  |    ...    |
0x24dd010 ->  | int32_t a |   <- bottom, left
0x24dd014 ->  | int32_t b |
0x24dd018 ->  | int32_t c |   <- right
0x24dd022 ->  | int32_t d |
0x24dd026 ->  |    ...    |

A Bottom object consists of a Left object, followed by a Right object. In turn, Left and Right objects consist of their own fields. So, a Bottom consists of four int32_ts.

One of the principles behind C++ was that access to object fields should be as fast as possible. In our case, it’s possible to avoid any indirection and make it as fast as normal variable access: since the compiler knows the memory layout, it replaces accesses to bottom->a by *bottom, accesses to bottom->b by *(bottom + 4), and so on. This works fine for bottom, which actually points to a Bottom object, and for left because the first part of a Bottom object is a Left object.

What about right? We want right->d to be replaceable by *(right+4), but it’s the second part of a Bottom object that is a Right object. For this to work, the statement right = bottom cannot simply assign bottom to right; instead, it casts Bottom* to Right* by assigning bottom + <offset of Right in Bottom> to right.

Now that we understand what’s happening, we can ask the next question: what is the last line of output? Is it left == bottom == right? Or !(left == bottom == right)? It’s the former: the compiler somehow keeps track of what the pointers are supposed to be, and tries to hide the offsetting from the programmer.

I generally like C++, but this gives me mixed feelings. A part of me thinks that this is awesome, and another part screams in horror.


This article is a very good description of C++’s multiple and virtual inheritance with a focus on memory layouts.