天天看点

Peering Inside the PE: A Tour of the Win32 Portable Executable File Format - Part 2

The Section Table

Betweenthe PE header and the raw data for the image's sections lies the section table.The section table is essentially a phone book containing information about eachsection in the image. The sections in the image are sorted by their startingaddress (RVAs), rather than alphabetically.

Now Ican better clarify what a section is. In an NE file, your program's code anddata are stored in distinct "segments" in the file. Part of the NEheader is an array of structures, one for each segment your program uses. Eachstructure in the array contains information about one segment. The informationstored includes the segment's type (code or data), its size, and its locationelsewhere in the file. In a PE file, the section table is analogous to thesegment table in the NE file. Unlike an NE file segment table, though, a PEsection table doesn't store a selector value for each code or data chunk.Instead, each section table entry stores an address where the file's raw datahas been mapped into memory. While sections are analogous to 32-bit segments,they really aren't individual segments. They're just really memory ranges in aprocess's virtual address space.

Anotherarea where PE files differ from NE files is how they manage the supporting datathat your program doesn't use, but the operating system does; for example, thelist of DLLs that the executable uses or the location of the fixup table. In anNE file, resources aren't considered segments. Even though they have selectorsassigned to them, information about resources is not stored in the NE header'ssegment table. Instead, resources are relegated to a separate table towards theend of the NE header. Information about imported and exported functions alsodoesn't warrant its own segment; it's crammed into the NE header.

Thestory with PE files is different. Anything that might be considered vital codeor data is stored in a full-fledged section. Thus, information about importedfunctions is stored in its own section, as is the table of functions that themodule exports. The same goes for the relocation data. Any code or data thatmight be needed by either the program or the operating system gets its ownsection.

Before Idiscuss specific sections, I need to describe the data that the operatingsystem manages the sections with. Immediately following the PE header in memoryis an array of IMAGE_SECTION_HEADERs. The number of elements in this array isgiven in the PE header (the IMAGE_NT_HEADER.FileHeader.NumberOfSections field).I used PEDUMP to output the section table and all of the section's fields andattributes. Figure 5 shows the PEDUMP output of a section table for a typicalEXE file, and Figure 6 shows the section table in an OBJ file.

Table 4. A Typical Section Table from an EXEFile

01 .text    VirtSize: 00005AFA  VirtAddr:  00001000

    rawdata offs:   00000400  raw data size: 00005C00

   relocation offs: 00000000 relocations:   00000000

   line # offs:     00009220  line #'s:     0000020C

   characteristics: 60000020

     CODE  MEM_EXECUTE  MEM_READ

  02.bss      VirtSize: 00001438  VirtAddr: 00007000

    rawdata offs:   00000000  raw data size: 00001600

   relocation offs: 00000000 relocations:   00000000

   line # offs:     00000000  line #'s:     00000000

   characteristics: C0000080

     UNINITIALIZED_DATA  MEM_READ  MEM_WRITE

  03.rdata    VirtSize: 0000015C  VirtAddr: 00009000

    rawdata offs:   00006000  raw data size: 00000200

   relocation offs: 00000000 relocations:   00000000

   line # offs:     00000000  line #'s:     00000000

   characteristics: 40000040

     INITIALIZED_DATA  MEM_READ

  04.data     VirtSize: 0000239C  VirtAddr: 0000A000

    rawdata offs:   00006200  raw data size: 00002400

   relocation offs: 00000000 relocations:   00000000

   line # offs:     00000000  line #'s:     00000000

   characteristics: C0000040

     INITIALIZED_DATA  MEM_READ  MEM_WRITE

  05.idata    VirtSize: 0000033E  VirtAddr: 0000D000

    rawdata offs:   00008600  raw data size: 00000400

   relocation offs: 00000000 relocations:   00000000

   line # offs:     00000000  line #'s:     00000000

   characteristics: C0000040

     INITIALIZED_DATA  MEM_READ  MEM_WRITE

  06.reloc    VirtSize: 000006CE  VirtAddr: 0000E000

    rawdata offs:   00008A00  raw data size: 00000800

    relocation offs: 00000000  relocations:  00000000

   line # offs:     00000000  line #'s:     00000000

   characteristics: 42000040

     INITIALIZED_DATA MEM_DISCARDABLE  MEM_READ

Table 5. A Typical Section Table from an OBJFile

01 .drectve PhysAddr: 00000000  VirtAddr:  00000000

    rawdata offs:   000000DC  raw data size: 00000026

   relocation offs: 00000000 relocations:   00000000

   line # offs:     00000000  line #'s:     00000000

   characteristics: 00100A00

     LNK_INFO  LNK_REMOVE

  02.debug$S  PhysAddr: 00000026  VirtAddr: 00000000

    rawdata offs:   00000102  raw data size: 000016D0

   relocation offs: 000017D2 relocations:   00000032

   line # offs:     00000000  line #'s:     00000000

   characteristics: 42100048

     INITIALIZED_DATA MEM_DISCARDABLE  MEM_READ

  03.data     PhysAddr: 000016F6  VirtAddr: 00000000

    rawdata offs:   000019C6  raw data size: 00000D87

   relocation offs: 0000274D relocations:   00000045

   line # offs:     00000000  line #'s:     00000000

   characteristics: C0400040

     INITIALIZED_DATA  MEM_READ  MEM_WRITE

  04.text     PhysAddr: 0000247D  VirtAddr: 00000000

    rawdata offs:   000029FF  raw data size: 000010DA

   relocation offs: 00003AD9 relocations:   000000E9

    line # offs:     000043F3 line #'s:      000000D9

   characteristics: 60500020

     CODE  MEM_EXECUTE  MEM_READ

  05.debug$T  PhysAddr: 00003557  VirtAddr: 00000000

    rawdata offs:   00004909  raw data size: 00000030

   relocation offs: 00000000 relocations:   00000000

   line # offs:     00000000  line #'s:     00000000

   characteristics: 42100048

     INITIALIZED_DATA MEM_DISCARDABLE  MEM_READ

EachIMAGE_SECTION_HEADER has the format described in Figure 7. It's interesting tonote what's missing from the information stored for each section. First off,notice that there's no indication of any PRELOAD attributes. The NE file formatallows you to specify with the PRELOAD attribute which segments should beloaded at module load time. The OS/2® 2.0 LX format has something similar,allowing you to specify up to eight pages to preload. The PE format has nothinglike this. Microsoft must be confident in the performance of Win32 demand-pagedloading.

Table 6. IMAGE_SECTION_HEADER Formats

BYTE Name[IMAGE_SIZEOF_SHORT_NAME]

This isan 8-byte ANSI name (not UNICODE) that names the section. Most section namesstart with a . (such as ".text"), but this is not a requirement, assome PE documentation would have you believe. You can name your own sectionswith either the segment directive in assembly language, or with "#pragmadata_seg" and "#pragma code_seg" in the Microsoft C/C++compiler. It's important to note that if the section name takes up the full 8bytes, there's no NULL terminator byte. If you're a printf devotee, you can use%.8s to avoid copying the name string to another buffer where you canNULL-terminate it.

union {

DWORD PhysicalAddress

DWORD VirtualSize

} Misc;

Thisfield has different meanings, in EXEs or OBJs. In an EXE, it holds the actualsize of the code or data. This is the size before rounding up to the nearestfile alignment multiple. The SizeOfRawData field (seems a bit of a misnomer)later on in the structure holds the rounded up value. The Borland linkerreverses the meaning of these two fields and appears to be correct. For OBJfiles, this field indicates the physical address of the section. The firstsection starts at address 0. To find the physical address in an OBJ file of thenext section, add the SizeOfRawData value to the physical address of thecurrent section.

DWORD VirtualAddress

In EXEs,this field holds the RVA to where the loader should map the section. Tocalculate the real starting address of a given section in memory, add the baseaddress of the image to the section's VirtualAddress stored in this field. WithMicrosoft tools, the first section defaults to an RVA of 0x1000. In OBJs, thisfield is meaningless and is set to 0.

DWORD SizeOfRawData

In EXEs,this field contains the size of the section after it's been rounded up to thefile alignment size. For example, assume a file alignment size of 0x200. If theVirtualSize field from above says that the section is 0x35A bytes in length,this field will say that the section is 0x400 bytes long. In OBJs, this field containsthe exact size of the section emitted by the compiler or assembler. In otherwords, for OBJs, it's equivalent to the VirtualSize field in EXEs.

DWORD PointerToRawData

This isthe file-based offset of where the raw data emitted by the compiler or assemblercan be found. If your program memory maps a PE or COFF file itself (rather thanletting the operating system load it), this field is more important than theVirtualAddress field. You'll have a completely linear file mapping in thissituation, so you'll find the data for the sections at this offset, rather thanat the RVA specified in the VirtualAddress field.

DWORD PointerToRelocations

In OBJs,this is the file-based offset to the relocation information for this section.The relocation information for each OBJ section immediately follows the rawdata for that section. In EXEs, this field (and the subsequent field) aremeaningless, and set to 0. When the linker creates the EXE, it resolves most ofthe fixups, leaving only base address relocations and imported functions to beresolved at load time. The information about base relocations and importedfunctions is kept in their own sections, so there's no need for an EXE to haveper-section relocation data following the raw section data.

DWORD PointerToLinenumbers

This isthe file-based offset of the line number table. A line number table correlatessource file line numbers to the addresses of the code generated for a givenline. In modern debug formats like the CodeView format, line number informationis stored as part of the debug information. In the COFF debug format, however,the line number information is stored separately from the symbolic name/typeinformation. Usually, only code sections (such as .text) have line numbers. InEXE files, the line numbers are collected towards the end of the file, afterthe raw data for the sections. In OBJ files, the line number table for asection comes after the raw section data and the relocation table for thatsection.

WORD NumberOfRelocations

Thenumber of relocations in the relocation table for this section (thePointerToRelocations field from above). This field seems relevant only for OBJfiles.

WORD NumberOfLinenumbers

Thenumber of line numbers in the line number table for this section (thePointerToLinenumbers field from above).

DWORD Characteristics

Whatmost programmers call flags, the COFF/PE format calls characteristics. Thisfield is a set of flags that indicate the section's attributes (such ascode/data, readable, or writeable,). For a complete list of all possiblesection attributes, see the IMAGE_SCN_XXX_XXX #defines in WINNT.H. Some of themore important flags are shown below:

0x00000020 This section containscode. Usually set in conjunction with the executable flag (0x80000000).

0x00000040 This section containsinitialized data. Almost all sections except executable and the .bss sectionhave this flag set.

0x00000080 This section containsuninitialized data (for example, the .bss section).

0x00000200 This section containscomments or some other type of information. A typical use of this section isthe .drectve section emitted by the compiler, which contains commands for thelinker.

0x00000800 This section'scontents shouldn't be put in the final EXE file. These sections are used by thecompiler/assembler to pass information to the linker.

0x02000000 This section can bediscarded, since it's not needed by the process once it's been loaded. The mostcommon discardable section is the base relocations (.reloc).

0x10000000 This section isshareable. When used with a DLL, the data in this section will be shared amongall processes using the DLL. The default is for data sections to be nonshared,meaning that each process using a DLL gets its own copy of this section's data.In more technical terms, a shared section tells the memory manager to set the pagemappings for this section such that all processes using the DLL refer to thesame physical page in memory. To make a section shareable, use the SHAREDattribute at link time. For example

LINK /SECTION:MYDATA,RWS ...

tells the linker that thesection called MYDATA should be readable, writeable, and shared.

0x20000000 This section isexecutable. This flag is usually set whenever the "contains code"flag (0x00000020) is set.

0x40000000 This section isreadable. This flag is almost always set for sections in EXE files.

0x80000000 The section iswriteable. If this flag isn't set in an EXE's section, the loader should markthe memory mapped pages as read-only or execute-only. Typical sections withthis attribute are .data and .bss. Interestingly, the .idata section also hasthis attribute set.

Alsomissing from the PE format is the notion of page tables. The OS/2 equivalent ofan IMAGE_SECTION_HEADER in the LX format doesn't point directly to where thecode or data for a section can be found in the file. Instead, it refers to apage lookup table that specifies attributes and the locations of specificranges of pages within a section. The PE format dispenses with all that, andguarantees that a section's data will be stored contiguously within the file.Of the two formats, the LX method may allow more flexibility, but the PE styleis significantly simpler and easier to work with. Having written file dumpersfor both formats, I can vouch for this!

Anotherwelcome change in the PE format is that the locations of items are stored assimple DWORD offsets. In the NE format, the location of almost everything isstored as a sector value. To find the real offset, you need to first look upthe alignment unit size in the NE header and convert it to a sector size(typically 16 or 512 bytes). You then need to multiply the sector size by thespecified sector offset to get an actual file offset. If by chance somethingisn't stored as a sector offset in an NE file, it is probably stored as anoffset relative to the NE header. Since the NE header isn't at the beginning ofthe file, you need to drag around the file offset of the NE header in yourcode. All in all, the PE format is much easier to work with than the NE, LX, orLE formats (assuming you can use memory-mapped files).

Common Sections

Havingseen what sections are in general and where they're located, let's look at thecommon sections that you'll find in EXE and OBJ files. The list is by no meanscomplete, but includes the sections you encounter every day (even if you're notaware of it).

The.text section is where all general-purpose code emitted by the compiler orassembler ends up. Since PE files run in 32-bit mode and aren't restricted to16-bit segments, there's no reason to break the code from separate source filesinto separate sections. Instead, the linker concatenates all the .text sectionsfrom the various OBJs into one big .text section in the EXE. If you use BorlandC++ the compiler emits its code to a segment named CODE. PE files produced withBorland C++ have a section named CODE rather than one called .text. I'llexplain this in a minute.

It wassomewhat interesting to me to find out that there was additional code in the.text section beyond what I created with the compiler or used from the run-timelibraries. In a PE file, when you call a function in another module (forexample, GetMessage in USER32.DLL), the CALL instruction emitted by thecompiler doesn't transfer control directly to the function in the DLL (seeFigure 8). Instead, the call instruction transfers control to a

JMP DWORD PTR [XXXXXXXX]

instructionthat's also in the .text section. The JMP instruction indirects through a DWORDvariable in the .idata section. This .idata section DWORD contains the realaddress of the operating system function entry point. After thinking about thisfor a while, I came to understand why DLL calls are implemented this way. Byfunneling all calls to a given DLL function through one location, the loaderdoesn't need to patch every instruction that calls a DLL. All the PE loader hasto do is put the correct address of the target function into the DWORD in the.idata section. No call instructions need to be patched. This is in markedcontrast to NE files, where each segment contains a list of fixups that need tobe applied to the segment. If the segment calls a given DLL function 20 times,the loader must write the address of that function 20 times into the segment.The downside to the PE method is that you can't initialize a variable with thetrue address of a DLL function. For example, you would think that somethinglike

Figure 2. Calling a function in anothermodule

FARPROC pfnGetMessage = GetMessage;

wouldput the address of GetMessage into the variable pfnGetMessage. In 16-bitWindows, this works, while in Win32 it doesn't. In Win32, the variablepfnGetMessage will end up holding the address of the JMP DWORD PTR [XXXXXXXX]thunk that I mentioned earlier. If you wanted to call through the functionpointer, things would work as you'd expect. However, if you want to read thebytes at the beginning of GetMessage, you're out of luck (unless you doadditional work to follow the .idata "pointer" yourself). I'll comeback to this topic later, in the discussion of the import table.

AlthoughBorland could have had the compiler emit segments with a name of .text, itchose a default segment name of CODE. To determine a section name in the PEfile, the Borland linker (TLINK32.EXE) takes the segment name from the OBJ fileand truncates it to 8 characters (if necessary).

Whilethe difference in the section names is a small matter, there is a moreimportant difference in how Borland PE files link to other modules. As Imentioned in the .text description, all calls to OBJs go through a JMP DWORDPTR [XXXXXXXX] thunk. Under the Microsoft system, this thunk comes to the EXEfrom the .text section of an import library. Because the library manager(LIB32) creates the import library (and the thunk) when you link the externalDLL, the linker doesn't have to "know" how to generate these thunksitself. The import library is really just some more code and data to link intothe PE file.

TheBorland system of dealing with imported functions is simply an extension of theway things were done for 16-bit NE files. The import libraries that the Borlandlinker uses are really just a list of function names along with the name of theDLL they're in. TLINK32 is therefore responsible for determining which fixupsare to external DLLs, and generating an appropriate JMP DWORD PTR [XXXXXXXX]thunk for it. TLINK32 stores the thunks that it creates in a section named .icode.

Just as.text is the default section for code, the .data section is where yourinitialized data goes. This data consists of global and static variables thatare initialized at compile time. It also includes string literals. The linkercombines all the .data sections from the OBJ and LIB files into one .datasection in the EXE. Local variables are located on a thread's stack, and takeno room in the .data or .bss sections.

The .bsssection is where any uninitialized static and global variables are stored. Thelinker combines all the .bss sections in the OBJ and LIB files into one .bsssection in the EXE. In the section table, the RawDataOffset field for the .bsssection is set to 0, indicating that this section doesn't take up any space inthe file. TLINK doesn't emit this section. Instead it extends the virtual sizeof the DATA section.

.CRT isanother initialized data section utilized by the Microsoft C/C++ run-timelibraries (hence the name). Why this data couldn't go into the standard .datasection is beyond me.

The.rsrc section contains all the resources for the module. In the early days ofWindows NT, the RES file output of the 16-bit RC.EXE wasn't in a format thatthe Microsoft PE linker could understand. The CVTRES program converted theseRES files into a COFF-format OBJ, placing the resource data into a .rsrcsection within the OBJ. The linker could then treat the resource OBJ as justanother OBJ to link in, allowing the linker to not "know" anythingspecial about resources. More recent linkers from Microsoft appear to be ableto process RES files directly.

The.idata section contains information about functions (and data) that the moduleimports from other DLLs. This section is equivalent to an NE file's modulereference table. A key difference is that each function that a PE file importsis specifically listed in this section. To find the equivalent information inan NE file, you'd have to go digging through the relocations at the end of theraw data for each of the segments.

The.edata section is a list of the functions and data that the PE file exports forother modules. Its NE file equivalent is the combination of the entry table,the resident names table, and the nonresident names table. Unlike in 16-bitWindows, there's seldom a reason to export anything from an EXE file, so youusually only see .edata sections in DLLs. When using Microsoft tools, the datain the .edata section comes to the PE file via the EXP file. Put another way,the linker doesn't generate this information on its own. Instead, it relies onthe library manager (LIB32) to scan the OBJ files and create the EXP file thatthe linker adds to its list of modules to link. Yes, that's right! Those peskyEXP files are really just OBJ files with a different extension.

The.reloc section holds a table of base relocations. A base relocation is anadjustment to an instruction or initialized variable value that's needed if theloader couldn't load the file where the linker assumed it would. If the loaderis able to load the image at the linker's preferred base address, the loadercompletely ignores the relocation information in this section. If you want totake a chance and hope that the loader can always load the image at the assumedbase address, you can tell the linker to strip this information with the /FIXEDoption. While this may save space in the executable file, it may cause theexecutable not to work on other Win32-based implementations. For example, sayyou built an EXE for Windows NT and based the EXE at 0x10000. If you told the linkerto strip the relocations, the EXE wouldn't run under Windows 95, where theaddress 0x10000 is already in use.

It'simportant to note that the JMP and CALL instructions that the compilergenerates use offsets relative to the instruction, rather than actual offsetsin the 32-bit flat segment. If the image needs to be loaded somewhere otherthan where the linker assumed for a base address, these instructions don't needto change, since they use relative addressing. As a result, there are not asmany relocations as you might think. Relocations are usually only needed forinstructions that use a 32-bit offset to some data. For example, let's say youhad the following global variable declarations:

int i;

int *ptr = &i;

If thelinker assumed an image base of 0x10000, the address of the variable i will endup containing something like 0x12004. At the memory used to hold the pointer"ptr", the linker will have written out 0x12004, since that's theaddress of the variable i. If the loader for whatever reason decided to loadthe file at a base address of 0x70000, the address of i would be 0x72004. The.reloc section is a list of places in the image where the difference betweenthe linker assumed load address and the actual load address needs to befactored in.

When youuse the compiler directive _ _declspec(thread), the data that you definedoesn't go into either the .data or .bss sections. It ends up in the .tlssection, which refers to "thread local storage," and is related tothe TlsAlloc family of Win32 functions. When dealing with a .tls section, thememory manager sets up the page tables so that whenever a process switchesthreads, a new set of physical memory pages is mapped to the .tls section'saddress space. This permits per-thread global variables. In most cases, it ismuch easier to use this mechanism than to allocate memory on a per-thread basisand store its pointer in a TlsAlloc'ed slot.

There'sone unfortunate note that must be added about the .tls section and _ _declspec(thread)variables. In Windows NT and Windows 95, this thread local storage mechanismwon't work in a DLL if the DLL is loaded dynamically by LoadLibrary. In an EXEor an implicitly loaded DLL, everything works fine. If you can't implicitlylink to the DLL, but need per-thread data, you'll have to fall back to usingTlsAlloc and TlsGetValue with dynamically allocated memory.

Althoughthe .rdata section usually falls between the .data and .bss sections, yourprogram generally doesn't see or use the data in this section. The .rdatasection is used for at least two things. First, in Microsoft linker-producedEXEs, the .rdata section holds the debug directory, which is only present inEXE files. (In TLINK32 EXEs, the debug directory is in a section named .debug.)The debug directory is an array of IMAGE_DEBUG_DIRECTORY structures. Thesestructures hold information about the type, size, and location of the varioustypes of debug information stored in the file. Three main types of debuginformation appear: CodeView®, COFF, and FPO. Figure 9 shows the PEDUMP outputfor a typical debug directory.

Table 7. A Typical Debug Directory

Type Size Address FilePtr Charactr TimeData Version
COFF 000065C5 00000000 00009200 00000000 2CF8CF3D 0.00
??? 00000114 00000000 0000F7C8 00000000 2CF8CF3D 0.00
FPO 000004B0 00000000 0000F8DC 00000000 2CF8CF3D 0.00
CODEVIEW 0000B0B4 00000000 0000FD8C 00000000 2CF8CF3D 0.00

Thedebug directory isn't necessarily found at the beginning of the .rdata section.To find the start of the debug directory table, use the RVA in the seventhentry (IMAGE_DIRECTORY_ENTRY_DEBUG) of the data directory. The data directoryis at the end of the PE header portion of the file. To determine the number ofentries in the Microsoft linker-generated debug directory, divide the size ofthe debug directory (found in the size field of the data directory entry) bythe size of an IMAGE_DEBUG_DIRECTORY structure. TLINK32 emits a simple count,usually 1. The PEDUMP sample program demonstrates this.

Theother useful portion of an .rdata section is the description string. If youspecified a DESCRIPTION entry in your program's DEF file, the specifieddescription string appears in the .rdata section. In the NE format, thedescription string is always the first entry of the nonresident names table.The description string is intended to hold a useful text string describing thefile. Unfortunately, I haven't found an easy way to find it. I've seen PE filesthat had the description string before the debug directory, and other filesthat had it after the debug directory. I'm not aware of any consistent methodof finding the description string (or even if it's present at all).

These.debug$S and .debug$T sections only appear in OBJs. They store the CodeViewsymbol and type information. The section names are derived from the segmentnames used for this purpose by previous 16-bit compilers ($$SYMBOLS and$$TYPES). The sole purpose of the .debug$T section is to hold the pathname tothe PDB file that contains the CodeView information for all the OBJs in theproject. The linker reads in the PDB and uses it to create portions of theCodeView information that it places at the end of the finished PE file.

The.drective section only appears in OBJ files. It contains text representationsof commands for the linker. For example, in any OBJ I compile with theMicrosoft compiler, the following strings appear in the .drectve section:

-defaultlib:LIBC -defaultlib:OLDNAMES

When youuse _ _declspec(export) in your code, the compiler simply emits thecommand-line equivalent into the .drectve section (for instance,"-export:MyFunction").

Inplaying around with PEDUMP, I've encountered other sections from time to time.For instance, in the Windows 95 KERNEL32.DLL, there are LOCKCODE and LOCKDATAsections. Presumably these are sections that will get special paging treatmentso that they're never paged out of memory.

Thereare two lessons to be learned from this. First, don't feel constrained to useonly the standard sections provided by the compiler or assembler. If you need aseparate section for some reason, don't hesitate to create your own. In theC/C++ compiler, use the #pragma code_seg and #pragma data_seg. In assemblylanguage, just create a 32-bit segment (which becomes a section) with a namedifferent from the standard sections. If using TLINK32, you must use adifferent class or turn off code segment packing. The other thing to rememberis that section names that are out of the ordinary can often give a deeperinsight into the purpose and implementation of a particular PE file.

PE File Imports

Earlier,I described how function calls to outside DLLs don't call the DLL directly.Instead, the CALL instruction goes to a JMP DWORD PTR [XXXXXXXX] instructionsomewhere in the executable's .text section (or .icode section if you're usingBorland C++). The address that the JMP instruction looks up and transferscontrol to is the real target address. The PE file's .idata section containsthe information necessary for the loader to determine the addresses of thetarget functions and patch them into the executable image.

The.idata section (or import table, as I prefer to call it) begins with an arrayof IMAGE_IMPORT_DESCRIPTORs. There is one IMAGE_IMPORT_DESCRIPTOR for each DLLthat the PE file implicitly links to. There's no field indicating the number ofstructures in this array. Instead, the last element of the array is indicatedby an IMAGE_IMPORT_DESCRIPTOR that has fields filled with NULLs. The format ofan IMAGE_IMPORT_DESCRIPTOR is shown in Figure 10.

Table 8. IMAGE_IMPORT_DESCRIPTOR Format

DWORD Characteristics

At onetime, this may have been a set of flags. However, Microsoft changed its meaningand never bothered to update WINNT.H. This field is really an offset (an RVA)to an array of pointers. Each of these pointers points to anIMAGE_IMPORT_BY_NAME structure.

DWORD TimeDateStamp

Thetime/date stamp indicating when the file was built.

DWORD ForwarderChain

Thisfield relates to forwarding. Forwarding involves one DLL sending on referencesto one of its functions to another DLL. For example, in Windows NT, NTDLL.DLLappears to forward some of its exported functions to KERNEL32.DLL. Anapplication may think it's calling a function in NTDLL.DLL, but it actuallyends up calling into KERNEL32.DLL. This field contains an index into FirstThunkarray (described momentarily). The function indexed by this field will beforwarded to another DLL. Unfortunately, the format of how a function isforwarded isn't documented, and examples of forwarded functions are hard tofind.

DWORD Name

This isan RVA to a NULL-terminated ASCII string containing the imported DLL's name.Common examples are "KERNEL32.DLL" and "USER32.DLL".

PIMAGE_THUNK_DATA FirstThunk

Thisfield is an offset (an RVA) to an IMAGE_THUNK_DATA union. In almost every case,the union is interpreted as a pointer to an IMAGE_IMPORT_BY_NAME structure. Ifthe field isn't one of these pointers, then it's supposedly treated as anexport ordinal value for the DLL that's being imported. It's not clear from thedocumentation if you really can import a function by ordinal rather than byname.

Theimportant parts of an IMAGE_IMPORT_DESCRIPTOR are the imported DLL name and thetwo arrays of IMAGE_IMPORT_BY_NAME pointers. In the EXE file, the two arrays(pointed to by the Characteristics and FirstThunk fields) run parallel to eachother, and are terminated by a NULL pointer entry at the end of each array. Thepointers in both arrays point to an IMAGE_IMPORT_BY_NAME structure. Figure 11shows the situation graphically. Figure 12 shows the PEDUMP output for animports table.

Figure 3. Two parallel arrays of pointers

Table 9. Imports Table from an EXE File

GDI32.dll

 Hint/Name Table: 00013064

 TimeDateStamp:   2C51B75B

 ForwarderChain:  FFFFFFFF

  Firstthunk RVA: 00013214

 Ordn  Name

   48  CreatePen

   57  CreateSolidBrush

   62  DeleteObject

  160  GetDeviceCaps

   //  Rest of table omitted...

 KERNEL32.dll

 Hint/Name Table: 0001309C

 TimeDateStamp:   2C4865A0

 ForwarderChain:  00000014

  Firstthunk RVA: 0001324C

 Ordn  Name

   83  ExitProcess

  137  GetCommandLineA

  179  GetEnvironmentStrings

  202  GetModuleHandleA

   //  Rest of table omitted...

 SHELL32.dll

 Hint/Name Table: 00013138

 TimeDateStamp:   2C41A383

 ForwarderChain:  FFFFFFFF

  Firstthunk RVA: 000132E8

 Ordn  Name

   46  ShellAboutA

 USER32.dll

 Hint/Name Table: 00013140

 TimeDateStamp:   2C474EDF

 ForwarderChain:  FFFFFFFF

  Firstthunk RVA: 000132F0

 Ordn  Name

   10  BeginPaint

   35  CharUpperA

   39  CheckDlgButton

   40  CheckMenuItem

   //  Rest of table omitted...

There isone IMAGE_IMPORT_BY_NAME structure for each function that the PE file imports.An IMAGE_IMPORT_BY_NAME structure is very simple, and looks like this:

WORD   Hint;

BYTE   Name[?];

Thefirst field is the best guess as to what the export ordinal for the importedfunction is. Unlike with NE files, this value doesn't have to be correct.Instead, the loader uses it as a suggested starting value for its binary searchfor the exported function. Next is an ASCIIZ string with the name of theimported function.

Why arethere two parallel arrays of pointers to the IMAGE_IMPORT_BY_NAME structures?The first array (the one pointed at by the Characteristics field) is leftalone, and never modified. It's sometimes called the hint-name table. Thesecond array (pointed at by the FirstThunk field) is overwritten by the PEloader. The loader iterates through each pointer in the array and finds theaddress of the function that each IMAGE_IMPORT_BY_NAME structure refers to. Theloader then overwrites the pointer to IMAGE_IMPORT_BY_NAME with the foundfunction's address. The [XXXXXXXX] portion of the JMP DWORD PTR [XXXXXXXX]thunk refers to one of the entries in the FirstThunk array. Since the array ofpointers that's overwritten by the loader eventually holds the addresses of allthe imported functions, it's called the Import Address Table.

For youBorland users, there's a slight twist to the above description. A PE fileproduced by TLINK32 is missing one of the arrays. In such an executable, the Characteristicsfield in the IMAGE_IMPORT_DESCRIPTOR (aka the hint-name array) is 0. Therefore,only the array that's pointed at by the FirstThunk field (the Import AddressTable) is guaranteed to exist in all PE files. The story would end here, exceptthat I ran into an interesting problem when writing PEDUMP. In the never endingsearch for optimizations, Microsoft "optimized" the thunk array inthe system DLLs for Windows NT (KERNEL32.DLL and so on). In this optimization,the pointers in the array don't point to an IMAGE_IMPORT_BY_NAMEstructure—rather, they already contain the address of the imported function. Inother words, the loader doesn't need to look up function addresses andoverwrite the thunk array with the imported function's addresses. This causes aproblem for PE dumping programs that are expecting the array to containpointers to IMAGE_IMPORT_BY_NAME structures. You might be thinking, "ButMatt, why don't you just use the hint-name table array?" That would be anideal solution, except that the hint-name table array doesn't exist in Borlandfiles. The PEDUMP program handles all these situations, but the code isunderstandably messy.

Sincethe import address table is in a writeable section, it's relatively easy tointercept calls that an EXE or DLL makes to another DLL. Simply patch theappropriate import address table entry to point at the desired interceptionfunction. There's no need to modify any code in either the caller or calleeimages. What could be easier?

It'sinteresting to note that in Microsoft-produced PE files, the import table isnot something wholly synthesized by the linker. All the pieces necessary tocall a function in another DLL reside in an import library. When you link aDLL, the library manager (LIB32.EXE or LIB.EXE) scans the OBJ files beinglinked and creates an import library. This import library is completelydifferent from the import libraries used by 16-bit NE file linkers. The importlibrary that the 32-bit LIB produces has a .text section and several .idata$sections. The .text section in the import library contains the JMP DWORD PTR[XXXXXXXX] thunk, which has a name stored for it in the OBJ's symbol table. Thename of the symbol is identical to the name of the function being exported bythe DLL (for example, [email protected]). One of the .idata$ sections in theimport library contains the DWORD that the thunk dereferences through. Anotherof the .idata$ sections has a space for the hint ordinal followed by theimported function's name. These two fields make up an IMAGE_IMPORT_BY_NAMEstructure. When you later link a PE file that uses the import library, theimport library's sections are added to the list of sections from your OBJs thatthe linker needs to process. Since the thunk in the import library has the samename as the function being imported, the linker assumes the thunk is really theimported function, and fixes up calls to the imported function to point at thethunk. The thunk in the import library is essentially "seen" as theimported function.

Besidesproviding the code portion of an imported function thunk, the import libraryprovides the pieces of the PE file's .idata section (or import table). Thesepieces come from the various .idata$ sections that the library manager put intothe import library. In short, the linker doesn't really know the differencesbetween imported functions and functions that appear in a different OBJ file.The linker just follows its preset rules for building and combining sections,and everything falls into place naturally.

PE File Exports

Theopposite of importing a function is exporting a function for use by EXEs orother DLLs. A PE file stores information about its exported functions in the.edata section. Generally, Microsoft linker-generated PE EXE files don't exportanything, so they don't have an .edata section. Borland's TLINK32 alwaysexports at least one symbol from an EXE. Most DLLs do export functions and havean .edata section. The primary components of an .edata section (aka the exporttable) are tables of function names, entry point addresses, and export ordinalvalues. In an NE file, the equivalents of an export table are the entry table,the resident names table, and the nonresident names table. These tables arestored as part of the NE header, rather than in distinct segments or resources.

At thestart of an .edata section is an IMAGE_EXPORT_DIRECTORY structure (see Table10). This structure is immediately followed by data pointed to by fields in thestructure.

Table 10. IMAGE_EXPORT_DIRECTORY Format

DWORD Characteristics

Thisfield appears to be unused and is always set to 0.

DWORD TimeDateStamp

Thetime/date stamp indicating when this file was created.

WORD MajorVersion

WORD MinorVersion

Thesefields appear to be unused and are set to 0.

DWORD Name

The RVAof an ASCIIZ string with the name of this DLL.

DWORD Base

Thestarting ordinal number for exported functions. For example, if the fileexports functions with ordinal values of 10, 11, and 12, this field contains10. To obtain the exported ordinal for a function, you need to add this valueto the appropriate element of the AddressOfNameOrdinals array.

DWORD NumberOfFunctions

Thenumber of elements in the AddressOfFunctions array. This value is also thenumber of functions exported by this module. Theoretically, this value could bedifferent than the NumberOfNames field (next), but actually they're always thesame.

DWORD NumberOfNames

Thenumber of elements in the AddressOfNames array. This value seems always to beidentical to the NumberOfFunctions field, and so is the number of exportedfunctions.

PDWORD *AddressOfFunctions

Thisfield is an RVA and points to an array of function addresses. The functionaddresses are the entry points (RVAs) for each exported function in thismodule.

PDWORD *AddressOfNames

Thisfield is an RVA and points to an array of string pointers. The strings are thenames of the exported functions in this module.

PWORD *AddressOfNameOrdinals

Thisfield is an RVA and points to an array of WORDs. The WORDs are the exportordinals of all the exported functions in this module. However, don't forget toadd in the starting ordinal number specified in the Base field.

Thelayout of the export table is somewhat odd (see Figure 4 and Table 10). As Imentioned earlier, the requirements for exporting a function are a name, anaddress, and an export ordinal. You'd think that the designers of the PE formatwould have put all three of these items into a structure, and then have anarray of these structures. Instead, each component of an exported entry is anelement in an array. There are three of these arrays (AddressOfFunctions,AddressOfNames, AddressOfNameOrdinals), and they are all parallel to oneanother. To find all the information about the fourth function, you need tolook up the fourth element in each array.

Figure 4. Export table layout

Table 11. Typical Exports Table from an EXEFile

Name:            KERNEL32.dll

 Characteristics: 00000000

 TimeDateStamp:   2C4857D3

 Version:         0.00

 Ordinal base:    00000001

  # offunctions:  0000021F

  # ofNames:      0000021F

  EntryPt  Ordn Name

 00005090     1  AddAtomA

 00005100     2  AddAtomW

 00025540     3  AddConsoleAliasA

 00025500     4  AddConsoleAliasW

 00026AC0     5  AllocConsole

 00001000     6  BackupRead

 00001E90     7  BackupSeek

 00002100     8  BackupWrite

 0002520C     9  BaseAttachCompleteThunk

 00024C50    10  BasepDebugDump

  //Rest of table omitted...

Incidentally,if you dump out the exports from the Windows NT system DLLs (for example,KERNEL32.DLL and USER32.DLL), you'll note that in many cases there are twofunctions that only differ by one character at the end of the name, forinstance CreateWindowExA and CreateWindowExW. This is how UNICODE support isimplemented transparently. The functions that end with A are the ASCII (orANSI) compatible functions, while those ending in W are the UNICODE version ofthe function. In your code, you don't explicitly specify which function tocall. Instead, the appropriate function is selected in WINDOWS.H, viapreprocessor #ifdefs. This excerpt from the Windows NT WINDOWS.H shows anexample of how this works:

#ifdef UNICODE

#define DefWindowProc  DefWindowProcW

#else

#define DefWindowProc  DefWindowProcA

#endif // !UNICODE

PE File Resources

Findingresources in a PE file is quite a bit more complicated than in an NE file. Theformats of the individual resources (for example, a menu) haven't changedsignificantly but you need to traverse a strange hierarchy to find them.

Navigatingthe resource directory hierarchy is like navigating a hard disk. There's amaster directory (the root directory), which has subdirectories. Thesubdirectories have subdirectories of their own that may point to the rawresource data for things like dialog templates. In the PE format, both the rootdirectory of the resource directory hierarchy and all of its subdirectories arestructures of type IMAGE_RESOURCE_DIRECTORY (see Table 12).

Table 12. IMAGE_RESOURCE_DIRECTORY Format

DWORD Characteristics

Theoreticallythis field could hold flags for the resource, but appears to always be 0.

DWORD TimeDateStamp

Thetime/date stamp describing the creation time of the resource.

WORD MajorVersion

WORD MinorVersion

Theoreticallythese fields would hold a version number for the resource. These field appearto always be set to 0.

WORD NumberOfNamedEntries

Thenumber of array elements that use names and that follow this structure.

WORD NumberOfIdEntries

Thenumber of array elements that use integer IDs, and which follow this structure.

IMAGE_RESOURCE_DIRECTORY_ENTRYDirectoryEntries[]

Thisfield isn't really part of the IMAGE_RESOURCE_DIRECTORY structure. Rather, it'san array of IMAGE_RESOURCE_DIRECTORY_ENTRY structures that immediately followthe IMAGE_RESOURCE_DIRECTORY structure. The number of elements in the array isthe sum of the NumberOfNamedEntries and NumberOfIdEntries fields. The directoryentry elements that have name identifiers (rather than integer IDs) come firstin the array.

Adirectory entry can either point at a subdirectory (that is, to anotherIMAGE_RESOURCE_DIRECTORY), or it can point to the raw data for a resource.Generally, there are at least three directory levels before you get to theactual raw resource data. The top-level directory (of which there's only one) isalways found at the beginning of the resource section (.rsrc). Thesubdirectories of the top-level directory correspond to the various types ofresources found in the file. For example, if a PE file includes dialogs, stringtables, and menus, there will be three subdirectories: a dialog directory, astring table directory, and a menu directory. Each of these type subdirectorieswill in turn have ID subdirectories. There will be one ID subdirectory for eachinstance of a given resource type. In the above example, if there are threedialog boxes, the dialog directory will have three ID subdirectories. Each IDsubdirectory will have either a string name (such as "MyDialog") orthe integer ID used to identify the resource in the RC file. Figure 5 shows aresource directory hierarchy example in visual form. Table 13 shows the PEDUMPoutput for the resources in the Windows NT CLOCK.EXE.

Figure 5. Resource directory hierarchy

Table 13. Resources Hierarchy for CLOCK.EXE

ResDir (0) Named:00 ID:06 TimeDate:2C3601DBVers:0.00 Char:0

   ResDir (ICON) Named:00 ID:02 TimeDate:2C3601DB Vers:0.00 Char:0

       ResDir (1) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 00000200

       ResDir (2) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 00000210

   ResDir (MENU) Named:02 ID:00 TimeDate:2C3601DB Vers:0.00 Char:0

       ResDir (CLOCK) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 00000220

        ResDir (GENERICMENU) Named:00 ID:01TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 00000230

   ResDir (DIALOG) Named:01 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

       ResDir (ABOUTBOX) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 00000240

       ResDir (64) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 00000250

   ResDir (STRING) Named:00 ID:03 TimeDate:2C3601DB Vers:0.00 Char:0

       ResDir (1) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 00000260

       ResDir (2) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 00000270

       ResDir (3) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 00000280

   ResDir (GROUP_ICON) Named:01 ID:00 TimeDate:2C3601DB Vers:0.00 Char:0

       ResDir (CCKK) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 00000290

   ResDir (VERSION) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

       ResDir (1) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0

           ID: 00000409  Offset: 000002A0

Asmentioned earlier, each directory entry is a structure of type IMAGE_RESOURCE_DIRECTORY_ENTRY(boy, these names are getting long!). Each IMAGE_RESOURCE_DIRECTORY_ENTRY hasthe format shown in Table 13.

Table 14. IMAGE_RESOURCE_DIRECTORY_ENTRYFormat

DWORD Name

Thisfield contains either an integer ID or a pointer to a structure that contains astring name. If the high bit (0x80000000) is zero, this field is interpreted asan integer ID. If the high bit is nonzero, the lower 31 bits are an offset(relative to the start of the resources) to an IMAGE_RESOURCE_DIR_STRING_Ustructure. This structure contains a WORD character count, followed by aUNICODE string with the resource name. Yes, even PE files intended fornon-UNICODE Win32 implementations use UNICODE here. To convert the UNICODEstring to an ANSI string, use the WideCharToMultiByte function.

DWORD OffsetToData

Thisfield is either an offset to another resource directory or a pointer toinformation about a specific resource instance. If the high bit (0x80000000) isset, this directory entry refers to a subdirectory. The lower 31 bits are anoffset (relative to the start of the resources) to anotherIMAGE_RESOURCE_DIRECTORY. If the high bit isn't set, the lower 31 bits point toan IMAGE_RESOURCE_DATA_ENTRY structure. The IMAGE_RESOURCE_DATA_ENTRY structurecontains the location of the resource's raw data, its size, and its code page.

To gofurther into the resource formats, I'd need to discuss the format of eachresource type (dialogs, menus, and so on). Covering these topics could easilyfill up an entire article on its own.

PE File Base Relocations

When thelinker creates an EXE file, it makes an assumption about where the file will bemapped into memory. Based on this, the linker puts the real addresses of codeand data items into the executable file. If for whatever reason the executableends up being loaded somewhere else in the virtual address space, the addressesthe linker plugged into the image are wrong. The information stored in the.reloc section allows the PE loader to fix these addresses in the loaded imageso that they're correct again. On the other hand, if the loader was able toload the file at the base address assumed by the linker, the .reloc sectiondata isn't needed and is ignored. The entries in the .reloc section are calledbase relocations since their use depends on the base address of the loadedimage.

Unlikerelocations in the NE file format, base relocations are extremely simple. Theyboil down to a list of locations in the image that need a value added to them.The format of the base relocation data is somewhat quirky. The base relocationentries are packaged in a series of variable length chunks. Each chunkdescribes the relocations for one 4KB page in the image. Let's look at anexample to see how base relocations work. An executable file is linked assuminga base address of 0x10000. At offset 0x2134 within the image is a pointercontaining the address of a string. The string starts at physical address0x14002, so the pointer contains the value 0x14002. You then load the file, butthe loader decides that it needs to map the image starting at physical address0x60000. The difference between the linker-assumed base load address and theactual load address is called the delta. In this case, the delta is 0x50000.Since the entire image is 0x50000 bytes higher in memory, so is the string (nowat address 0x64002). The pointer to the string is now incorrect. The executablefile contains a base relocation for the memory location where the pointer tothe string resides. To resolve a base relocation, the loader adds the deltavalue to the original value at the base relocation address. In this case, theloader would add 0x50000 to the original pointer value (0x14002), and store theresult (0x64002) back into the pointer's memory. Since the string really is at0x64002, everything is fine with the world.

Eachchunk of base relocation data begins with an IMAGE_BASE_RELOCATION structurethat looks like Table 14. Table 15 shows some base relocations as shown byPEDUMP. Note that the RVA values shown have already been displaced by theVirtualAddress in the IMAGE_BASE_RELOCATION field.

Figure 15. IMAGE_BASE_RELOCATION Format

DWORD VirtualAddress

Thisfield contains the starting RVA for this chunk of relocations. The offset ofeach relocation that follows is added to this value to form the actual RVAwhere the relocation needs to be applied.

DWORD SizeOfBlock

The sizeof this structure plus all the WORD relocations that follow. To determine thenumber of relocations in this block, subtract the size of anIMAGE_BASE_RELOCATION (8 bytes) from the value of this field, and then divideby 2 (the size of a WORD). For example, if this field contains 44, there are 18relocations that immediately follow:

 (44 -sizeof(IMAGE_BASE_RELOCATION)) / sizeof(WORD) = 18

WORD TypeOffset

This isn't just a single WORD,but rather an array of WORDs, the number of which is calculated by the aboveformula. The bottom 12 bits of each WORD are a relocation offset, and need tobe added to the value of the Virtual Address field from this relocation block'sheader. The high 4 bits of each WORD are a relocation type. For PE files thatrun on Intel CPUs, you'll only see two types of relocations:

IMAGE_REL_BASED_ABSOLUTE This relocation is meaningless and is only used as a place holder to round relocation blocks up to a DWORD multiple size.
3 IMAGE_REL_BASED_HIGHLOW This relocation means add both the high and low 16 bits of the delta to the DWORD specified by the calculated RVA.

Table 16. The Base Relocations from an EXEFile

Virtual Address: 00001000  size: 0000012C

 00001032 HIGHLOW

 0000106D HIGHLOW

 000010AF HIGHLOW

 000010C5 HIGHLOW

  //Rest of chunk omitted...

Virtual Address: 00002000  size: 0000009C

 000020A6 HIGHLOW

 00002110 HIGHLOW

 00002136 HIGHLOW

  00002156HIGHLOW

  //Rest of chunk omitted...

Virtual Address: 00003000  size: 00000114

 0000300A HIGHLOW

 0000301E HIGHLOW

 0000303B HIGHLOW

 0000306A HIGHLOW

  //Rest of relocations omitted...

Differences Between PE and COFF OBJ Files

Thereare two portions of the PE file that are not used by the operating system.These are the COFF symbol table and the COFF debug information. Why wouldanyone need COFF debug information when the much more complete CodeViewinformation is available? If you intend to use the Windows NT system debugger(NTSD) or the Windows NT kernel debugger (KD), COFF is the only game in town.For those of you who are interested, I've included a detailed description ofthese parts of the PE file in the online posting that accompanies this article(available on all MSJ bulletin boards).

At manypoints throughout the preceding discussion, I've noted that many structures andtables are the same in both a COFF OBJ file and the PE file created from it.Both COFF OBJ and PE files have an IMAGE_FILE_HEADER at or near theirbeginning. This header is followed by a section table that contains informationabout all the sections in the file. The two formats also share the same linenumber and symbol table formats, although the PE file can have additionalnon-COFF symbol tables as well. The amount of commonality between the OBJ andPE EXE formats is evidenced by the large amount of common code in PEDUMP (seeCOMMON.C on any MSJ bulletin board).

Thissimilarity between the two file formats isn't happenstance. The goal of thisdesign is to make the linker's job as easy as possible. Theoretically, creatingan EXE file from a single OBJ should be just a matter of inserting a few tablesand modifying a couple of file offsets within the image. With this in mind, youcan think of a COFF file as an embryonic PE file. Only a few things are missingor different, so I'll list them here.

·        COFF OBJ files don't have anMS-DOS stub preceding the IMAGE_FILE_HEADER, nor is there a "PE"signature preceding the IMAGE_FILE_HEADER.

·        OBJ files don't have theIMAGE_OPTIONAL_HEADER. In a PE file, this structure immediately follows theIMAGE_FILE_HEADER. Interestingly, COFF LIB files do have anIMAGE_OPTIONAL_HEADER. Space constraints prevent me from talking about LIBfiles here.

·        OBJ files don't have baserelocations. Instead, they have regular symbol-based fixups. I haven't goneinto the format of the COFF OBJ file relocations because they're fairlyobscure. If you want to dig into this particular area, the PointerToRelocationsand NumberOfRelocations fields in the section table entries point to therelocations for each section. The relocations are an array of IMAGE_RELOCATIONstructures, which is defined in WINNT.H. The PEDUMP program can show OBJ filerelocations if you enable the proper switch.

·        The CodeView information in anOBJ file is stored in two sections (.debug$S and .debug$T). When the linker processesthe OBJ files, it doesn't put these sections in the PE file. Instead, itcollects all these sections and builds a single symbol table stored at the endof the file. This symbol table isn't a formal section (that is, there's noentry for it in the PE's section table).

Using PEDUMP

PEDUMPis a command-line utility for dumping PE files and COFF OBJ format files. Ituses the Win32 console capabilities to eliminate the need for extensive userinterface work. The syntax for PEDUMP is as follows:

PEDUMP [switches] filename

Theswitches can be seen by running PEDUMP with no arguments. PEDUMP uses theswitches shown in Table 17. By default, none of the switches are enabled.Running PEDUMP without any of the switches provides most of the useful informationwithout creating a huge amount of output. PEDUMP sends its output to thestandard output file, so its output can be redirected to a file with an > onthe command line.

Table 17. PEDUMP Switches

/A Include everything in dump (essentially, enable all the switches)
/H Include a hex dump of each section at the end of the dump
/L Include line number information (both PE and COFF OBJ files)
/R Show base relocations (PE files only)
/S Show symbol table (both PE and COFF OBJ files)

Summary

With theadvent of Win32, Microsoft made sweeping changes in the OBJ and executable fileformats to save time and build on work previously done for other operatingsystems. A primary goal of these file formats is to enhance portability acrossdifferent platforms.