Load-time relocation of shared libraries

This article’s aim is to explain how a modern operating system makes it possible to use shared libraries with load-time relocation. It focuses on the Linux OS running on 32-bit x86, but the general principles apply to other OSes and CPUs as well.

Note that shared libraries have many names – shared libraries, shared objects, dynamic shared objects (DSOs), dynamically linked libraries (DLLs – if you’re coming from a Windows background). For the sake of consistency, I will try to just use the name "shared library" throughout this article.

Linux, similarly to other OSes with virtual memory support, loads executables to a fixed memory address. If we examine the ELF header of some random executable, we’ll see an Entry point address:

Unlike executables, when shared libraries are being built, the linker can’t assume a known load address for their code. The reason for this is simple. Each program can use any number of shared libraries, and there’s simply no way to know in advance where any given shared library will be loaded in the process’s virtual memory. Many solutions were invented for this problem over the years, but in this article I will just focus on the ones currently used by Linux.

Note how ml_func references myglob a few times. When translated to x86 assembly, this will involve a mov instruction to pull the value of myglob from its location in memory into a register. movrequires an absolute address – so how does the linker know which address to place in it? The answer is – it doesn’t. As I mentioned above, shared libraries have no pre-defined load address – it will be decided at runtime.

There are two main approaches to solve this problem in Linux ELF shared libraries:

Load-time relocation

Position independent code (PIC)

To create a shared library that has to be relocated at load-time, I’ll compile it without the -fPIC flag (which would otherwise trigger PIC generation):

The first interesting thing to see is the entry point of libmlreloc.so:

For simplicity, the linker just links the shared object for address 0x0 (the .text section starting at0x3b0), knowing that the loader will move it anyway. Keep this fact in mind – it will be useful later in the article.

Now let’s look at the disassembly of the shared library, focusing on ml_func:

The rel.dyn section of ELF is reserved for dynamic (load-time) relocations, to be consumed by the dynamic loader. There are 3 relocation entries for myglob in the section showed above, since there are 3 references to myglob in the disassembly. Let’s decipher the first one.

It says: go to offset 0×470 in this object (shared library), and apply relocation of type R_386_32 to it for symbol myglob. If we consult the ELF spec we see that relocation type R_386_32 means: take the value at the offset specified in the entry, add the address of the symbol to it, and place it back into the offset.

What do we have at offset 0x470 in the object? Recall this instruction from the disassembly ofml_func:

a1 encodes the mov instruction, so its operand starts at the next address which is 0x470. This is the0x0 we see in the disassembly. So back to the relocation entry, we now see it says: add the address of myglob to the operand of that mov instruction. In other words it tells the dynamic loader – once you perform actual address assignment, put the real address of myglob into 0x470, thus replacing the operand of mov by the correct symbol value. Neat, huh?

Note also the "Sym. value" column in the relocation section, which contains 0x200C for myglob. This is the offset of myglob in the virtual memory image of the shared library (which, recall, the linker assumes is just loaded at 0x0). This value can also be examined by looking at the symbol table of the library, for example with nm:

This output also provides the offset of myglob inside the library. D means the symbol is in the initialized data section (.data).

To see the load-time relocation in action, I will use our shared library from a simple driver executable. When running this executable, the OS will load the shared library and relocate it appropriately.

This is a rather weak deterrent, however. There is a way to make sense in it all. But first, let’s talk about the segments our shared library consists of:

To follow the myglob symbol, we’re interested in the second segment listed here. Note a couple of things:

In the section to segment mapping in the bottom, segment 01 is said to contain the .datasection, which is the home of myglob

The VirtAddr column specifies that the second segment starts at 0x1f04 and has size 0x10c, meaning that it extends until 0x2010 and thus contains myglob which is at 0x200C.

So I’m going to write the following code into driver.c:

header_handler implements the callback for dl_iterate_phdr. It will get called for all libraries and report their names and load addresses, along with all their segments. It also invokes ml_func, which is taken from the libmlreloc.so shared library.

To compile and link this driver with our shared library, run:

Since driver reports all the libraries it loads (even implicitly, like libc or the dynamic loader itself), the output is lengthy and I will just focus on the report about libmlreloc.so. Note that the 6 segments are the same segments reported by readelf, but this time relocated into their final memory locations.

Let’s do some math. The output says libmlreloc.so was placed in virtual address 0x12e000. We’re interested in the second segment, which as we’ve seen in readelf is at ofset 0x1f04. Indeed, we see in the output it was loaded to address 0x12ff04. And since myglob is at offset0x200c in the file, we’d expect it to now be at address 0x13000c.

So, let’s ask GDB:

Excellent! But what about the code of ml_func which refers to myglob? Let’s ask GDB again:

As expected, the real address of myglob was placed in all the mov instructions referring to it, just as the relocation entries specified.

So far this article demonstrated relocation of data references – using the global variable myglob as an example. Another thing that needs to be relocated is code references – in other words, function calls. This section is a brief guide on how this gets done. The pace is much faster than in the rest of this article, since I can now assume the reader understands what relocation is all about.

Without further ado, let’s get to it. I’ve modified the code of the shared library to be the following:

ml_util_func was added and it’s being used by ml_func. Here’s the disassembly of ml_func in the linked shared library:

What’s interesting here is the instruction at address 0x4b3 – it’s the call to ml_util_func. Let’s dissect it:

e8 is the opcode for call. The argument of this call is the offset relative to the next instruction. In the disassembly above, this argument is 0xfffffffc, or simply -4. So the call currently points to itself. This clearly isn’t right – but let’s not forget about relocation. Here’s what the relocation section of the shared library looks like now:

If we compare it to the previous invocation of readelf -r, we’ll notice a new entry added forml_util_func. This entry points at address 0x4b4 which is the argument of the call instruction, and its type is R_386_PC32. This relocation type is more complicated than R_386_32, but not by much.

It means the following: take the value at the offset specified in the entry, add the address of the symbol to it, subtract the address of the offset itself, and place it back into the word at the offset. Recall that this relocation is done at load-time, when the final load addresses of the symbol and the relocated offset itself are already known. These final addresses participate in the computation.

What does this do? Basically, it’s a relative relocation, taking its location into account and thus suitable for arguments of instructions with relative addressing (which the e8 call is). I promise it will become clearer once we get to the real numbers.

I’m now going to build the driver code and run it under GDB again, to see this relocation in action. Here’s the GDB session, followed by explanations:

The important parts here are:

ml_util_func was loaded to address 0x0012e49c

The address of the relocated offset is 0x0012e4b4

The call in ml_func to ml_util_func was patched to place 0xffffffe4 in the argument (I disassembled ml_func with the /r flag to show raw hex in addition to disassembly), which is interpreted as the correct offset to ml_util_func.

Obviously we’re most interested in how (4) was done. Again, it’s time for some math. Interpreting the R_386_PC32 relocation entry mentioned above, we have:

Take the value at the offset specified in the entry (0xfffffffc), add the address of the symbol to it (0x0012e49c), subtract the address of the offset itself (0x0012e4b4), and place it back into the word at the offset. Everything is done assuming 32-bit 2-s complement, of course. The result is0xffffffe4, as expected.

This is a "bonus" section that discusses some peculiarities of the implementation of shared library loading in Linux. If all you wanted was to understand how relocations are done, you can safely skip it.

When trying to understand the call relocation of ml_util_func, I must admit I scratched my head for some time. Recall that the argument of call is a relative offset. Surely the offset between thecall and ml_util_func itself doesn’t change when the library is loaded – they both are in the code segment which gets moved as one whole chunk. So why is the relocation needed at all?

Here’s a small experiment to try: go back to the code of the shared library, add static to the declaration of ml_util_func. Re-compile and look at the output of readelf -r again.

Done? Anyway, I will reveal the outcome – the relocation is gone! Examine the disassembly ofml_func – there’s now a correct offset placed as the argument of call – no relocation required. What’s going on?

When tying global symbol references to their actual definitions, the dynamic loader has some rules about the order in which shared libraries are searched. The user can also influence this order by setting the LD_PRELOAD environment variable.

Again, this is a bonus section that discusses an advanced topic. It can be skipped safely if you’re tired of this stuff.

In the example above, myglob was only used internally in the shared library. What happens if we reference it from the program (driver.c)? After all, myglob is a global variable and thus visible externally.

Let’s modify driver.c to the following (note I’ve removed the segment iteration code):

It now prints the address of myglob. The output is:

Wait, something doesn’t compute here. Isn’t myglob in the shared library’s address space?0x804xxxx looks like the program’s address space. What’s going on?

Recall that the program/executable is not relocatable, and thus its data addresses have to bound at link time. Therefore, the linker has to create a copy of the variable in the program’s address space, and the dynamic loader will use that as the relocation address. This is similar to the discussion in the previous section – in a sense, myglob in the main program overrides the one in the shared library, and according to the global symbol lookup rules, it’s being used instead. If we examine ml_func in GDB, we’ll see the correct reference made to myglob:

This makes sense because a R_386_32 relocation for myglob still exists in libmlreloc.so, and the dynamic loader makes it point to the correct place where myglob now lives.

This is all great, but something is missing. myglob is initialized in the shared library (to 42) – how does this initialization value get to the address space of the program? It turns out there’s a special relocation entry that the linker builds into the program (so far we’ve only been examining relocation entries in the shared library):

Note the R_386_COPY relocation for myglob. It simply means: copy the value from the symbol’s address into this offset. The dynamic loader performs this when it loads the shared library. How does it know how much to copy? The symbol table section contains the size of each symbol; for example the size for myglob in the .symtab section of libmlreloc.so is 4.

I think this is a pretty cool example that shows how the process of executable linking and loading is orchestrated together. The linker puts special instructions in the output for the dynamic loader to consume and execute.

Load-time relocation is one of the methods used in Linux (and other OSes) to resolve internal data and code references in shared libraries when loading them into memory. These days, position independent code (PIC) is a more popular approach, and some modern systems (such as x86-64) no longer support load-time relocation.

Regardless of the motivation, I hope this article has helped to shed some light on the magic going behind the scenes of linking and loading shared libraries in a modern OS.

Link-time relocation happens in the process of combining multiple object files into an executable (or shared library). It involves quite a lot of relocations to resolve symbol references between the object files. Link-time relocation is a more complex topic than load-time relocation, and I won’t cover it in this article.

This can be made possible by compiling all your libraries into static libraries (with ar combining object files instead gcc -shared), and providing the -static flag to gcc when linking the executable – to avoid linkage with the shared version of libc.

ml simply stands for "my library". Also, the code itself is absolutely non-sensical and only used for purposes of demonstration.

Also called "dynamic linker". It’s a shared object itself (though it can also run as an executable), residing at /lib/ld-linux.so.2 (the last number is the SO version and may be different).

You can provide the -l flag to objdump to add C source lines into the disassembly, making it clearer what gets compiled to what. I’ve omitted it here to make the output shorter.

I’m looking at the left-hand side of the output of objdump, where the raw memory bytes are. a1 00 00 00 00 means mov to eax with operand 0x0, which is interpreted by the disassembler asds:0x0.

So ldd invoked on the executable will report a different load address for the shared library each time it’s run.

Experienced readers will probably note that I could ask GDB about i shared to get the load-address of the shared library. However, i shared only mentions the load location of the whole library (or, even more accurately, its entry point), and I was interested in the segments.

What, 0x12e000 again? Didn’t I just talk about load-address randomization? It turns out the dynamic loader can be manipulated to turn this off, for purposes of debugging. This is exactly what GDB is doing.

Unless it’s passed the -Bsymbolic flag. Read all about it in the man page of ld.

<a href="http://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64/">Position Independent Code (PIC) in shared libraries on x64</a>

<a href="http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/">Position Independent Code (PIC) in shared libraries</a>

<a href="http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models/">Understanding the x64 code models</a>

<a href="http://eli.thegreenplace.net/2012/08/13/how-statically-linked-programs-run-on-linux/">How statically linked programs run on Linux</a>

<a href="http://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing/">Shared counter with Python’s multiprocessing</a>

Load-time relocation of shared libraries

繼續閱讀

C語言：初學者必定看懂的注釋！！！猴子吃桃問題。猴子第一天摘下若幹個桃子，每天都吃了前一天剩下的一半零一個，到第10天早上想再吃的時候，就剩下一個桃子. 求第一天共摘多少個桃子。

Ubuntu14.04 LTS下安裝mongodb

[轉]九大排序算法——C語言實作及詳解

while 循環、do- while 循環和 for 循環之間的那點事C語言自學之三種循環比較

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

配置網頁内容通路

手動安裝Intel network I217-LM網卡的Linux驅動

禁止ubuntu系統彈出報錯界面

Ubuntu Linux下Apache的配置檔案

結構體：typedef與struct的差別

samba伺服器的功能

【Linux】UDP廣播封包接收速率問題

Linux裝置模型（中）之上層容器

PowerPC平台 Linux移植三

hdu7108哈希