天天看點

non-virtual thunk for Virtual Function in multiple inheritance What every C++ programmer should know, The hard part The Virtual Table Virtual in multiple inheritance Method Pointer Conclusion

轉載自

http://thomas-sanchez.net/computer-sciences/2011/08/15/what-every-c-programmer-should-know-the-hard-part/

What every C++ programmer should know, The hard part

Previously, I explained how C++ does to handle the classes and inheritance between them. But, I did not cover how the virtual is handled.

It adds a lot of complexity, C++ is compiled and when a binary is linked against a library they have to speak the same language: they have to share the same ABI. The C++ creators had to find a way to give along the program lifetime metadata about the manipulated classes.

They chose the Virtual Tables.

The Virtual Table

When a C++ program is compiled, the binary embedded some information about the manipulated classes by the program. When a class inherits from an interface, the actual implementation of the method should always be accessible. The Virtual Table (VTable) are generated during the compilation process,they can be seen as array of method pointers.

Let’s take an example:

01

#include <iostream>

02

03

struct

Interface

04

{

05

Interface() : i(0x424242) {}

06

virtual

void

test_method() = 0;

07

virtual

~Interface(){}

08

int

i;

09

};

10

11

struct

Daughter : 

public

Interface

12

{

13

void

test_method()

14

{

15

std::cout << 

"This is a call to the method"

<< std::endl;

16

std::cout << 

"This: "

<< 

this

<< std::endl;

17

}

18

};

19

20

int

main()

21

{

22

Daughter* d = 

new

Daughter;

23

Interface* i = d;

24

25

i->test_method();

26

27

std::cout << 

sizeof

(Daughter) << std::endl;

28

std::cout << *((

void

**)i) << std::endl;

29

std::cout << ((

void

**)i)[1] << std::endl;

30

}

I recall that all the test have been done on a Linux 64bits.

The size of a 

Daughter

 instance is not 8 as we could expect but 16bytes. The memory dump shows that the first field of the class is not the value of 

i

 but a strange value and our field come next to it. Our ‘strange’ value is actually a pointer, in fact it is a pointer inside our binary.

nm -C test | grep 400d
0000000000400de0 V vtable for Daughter      

I will explain after why there is a difference of some bytes between the two. So this pointer represent the location of the 

Daughter

 VTable. We can now check its content.

As I said, a VTable is a kind of array of method pointer.

To get a pointer on it, it is simply:

size_t** vtable = *(size_t***)i;
std::cout << vtable[0] << std::endl;      

And if we check the new address printed on the output we can see that it is actually our pointer on method.

nm -C test | grep -E 400c6a
0000000000400c6a W Daughter::test_method()      

We can play a little bite more to test deeper:

typedef void (*VtablePtr) (Daughter*);
VtablePtr ptr = (VtablePtr)vtable[0];
ptr(d);      

The VTable are determined along the compilation. When the compiler see a virtual method in a class in start to construct a VTable associated to this class. When this class is inherited by another one, it will automatically duplicate and receive a pointer on a VTable for the current parsed class. Each entry of the VTable will be filled when the actual definition of the method is encountered. It is always the last definition which is kept.

The index of the method in the VTable is the same as the apparition order in the source file, that's why it's very important that all the part of a project is compiled with consistent header. It is always embarrassing when the bad method is called in a project without knowing why…

Here is the complete code:

01

#include <iostream>

02

03

struct

Interface

04

{

05

Interface() : i(0x424242) {}

06

virtual

void

test_method() = 0;

07

virtual

~Interface(){}

08

int

i;

09

};

10

11

struct

Daughter : 

public

Interface

12

{

13

void

test_method()

14

{

15

std::cout << 

"This is a call to the method"

<< std::endl;

16

std::cout << 

"This: "

<< 

this

<< std::endl;

17

}

18

};

19

20

int

main()

21

{

22

Daughter* d = 

new

Daughter;

23

Interface* i = d;

24

25

i->test_method();

26

27

std::cout << 

sizeof

(Daughter) << std::endl;

28

std::cout << *((

void

**)i) << std::endl;

29

std::cout << ((

void

**)i)[1] << std::endl;

30

31

size_t

** vtable = *(

size_t

***)i;

32

std::cout << vtable[0] << std::endl;

33

34

typedef

void

(*VtablePtr) (Daughter*);

35

VtablePtr ptr = (VtablePtr)vtable[0];

36

ptr(d);

37

38

}

In conclusion, when virtual appears an instance should be seen like this:

VPTR
Base1
Daughter      

And the instance is heavier of 

sizeof(void*)*nb_of_vptr

 bytes.

Virtual in multiple inheritance

As usual, we are going to start with a trivial code:

01

#include <iostream>

02

03

struct

Mother

04

{

05

virtual

void

mother()=0;

06

virtual

~Mother() {}

07

int

i;

08

};

09

10

struct

Father

11

{

12

virtual

void

father()=0;

13

virtual

~Father() {}

14

int

j;

15

};

16

17

struct

Daughter : 

public

Mother, 

public

Father

18

{

19

void

mother()

20

{ std::cout << 

"Mother: "

<< 

this

<< std::endl; }

21

22

void

father()

23

{ std::cout << 

"Father: "

<< 

this

<< std::endl; }

24

25

int

k;

26

};

27

28

int

main()

29

{

30

Daughter* d = 

new

Daughter;

31

Mother* m = d;

32

Father* f = d;

33

34

std::cout << 

"Daughter: "

<< (

void

*)d << std::endl;

35

std::cout << 

"Father  : "

<< (

void

*)f << std::endl;

36

std::cout << 

sizeof

(*d) << std::endl;

37

38

std::cout << *((

void

**)d) << std::endl;

39

std::cout << *((

void

**)f) << std::endl;

40

}

As you can note, the two table used are different. When the types are manipulated, this is not always (never?) the concrete type used but the abstract one. With multiple inheritance it can be a 

Mother

 or a 

Father

 instances, so when a 

Father

 is used and the actual implementation is in 

Daughter

, the method should be accessible. That's why there is another VTable pointer.

However, when an instance of type 

Daughter

 is used through a 

Father

 pointer, 

Daughter

method cannot be called directly. Indeed, the instance pointer needs to be adjusted to match a 

Daughter

 instance. To solve this problem, there are the Thunk function.

If we print the first entry of the VTable and if we disassemble the code a this location, we have this:

1

0000000000400cf4 <non-

virtual

thunk to Daughter::father()>:

2

400cf4:       48 83 ef 10             sub    $0x10,%rdi

3

400cf8:       eb 00                   jmp    400cfa <Daughter::father()>

These two instructions perform pointer adjustment by subtracting the size of the 

Mother

class (and then match the 

Daughter

 instance). Therefore, if you have multiple inheritance with method you can add some indirection very easily:

  • Get the VTable;
  • Move to the wanted method (apply an offset on the VTable pointer, for example 8 to get the second method);
  • Call the method;
  • Adjust the this pointer;
  • Jump to the actual method definition.

Method Pointer

Yes, method pointer have a cost. Contrary to the C where function pointers have no overhead, the C++ had to deal with the difference between:

  • From which instance the method is accessed;
  • Is the method virtual?

The first point require a pointer adjustment. The second point, well, lot of things.

Firstly, the size of a method pointer is 16 bytes (against 8 in C). The method pointer is in three parts:

  1. Offset
  2. Address/index
  3. virtual?

The first one is on 8 bytes, the second on 8 bytes also. The third part is on one byte and is merged with the second one. If the last byte is set then the second part should be seen as an index (the index of the method in the VTable), otherwise it is the address of the method.

Therefore, calling a method pointer require ~ 20 asm instructions (in the worst case):

  1. Get the offset to apply on the instance pointer;
  2. Apply it;
  3. Check if we call a virtual member function;
  4. If yes, subtract 1;
  5. Get the VTable;
  6. Get the method address;
  7. Call the method.

Conclusion

In a next article I'll cover the VTable prefix and the virtual inheritance but there are less common in C++ code. In these two articles I tried to put some light on C++'s internal mechanism. The C++ is a fast language but it can become much less efficient because of complex class relation. I don't say: "don't use virtual and method pointer", I think programmers should be aware of these counterparts.

I think the readability is more important than performances. Yes, you can have a lot of overhead in C++ but it will still be more efficient than a lot of languages. But sometimes you can avoid virtualization. For example, the common ways for a beginner (and sometimes less beginners C++ programmers) to do an abstraction is to define an interface and for the different implementation, define a new class which inherits from this interface.

Sometimes, ok it is the right thing to do, sometimes not. If you are asked to write an abstraction to the filesystem on Linux and Windows if you follow the described way, you'll write an 

iFS

 interface, a 

WindowsFS

 and a 

LinuxFS

. It'll work well but you can do even better: You can write a 

WindowsFS

 and 

LinuxFS

 and define a new type 

FS

 according to the platform where the code is compiled, on Linux we could imagine something like this:

typedef LinuxFS FS;

With a code like this, you'll avoid some overheard due to the interface. It works well on abstraction of platform specific features but it does not work on data abstraction and you'll need an interface.

Here are some resources:

  • CRTP
  • Wikipedia