天天看點

編譯基礎-從hello.c到hello可執行檔案的過程

文章目錄

    • 編譯的步驟
    • 一步一步編譯
    • 指定編譯到某個階段
    • gcc -E -S -c
    • HelloWorld.i HelloWorld.s HelloWorld.o HelloWorld 每個檔案中内容是什麼?
      • HelloWorld.i 預處理檔案
      • HelloWorld.s 彙編代碼檔案
      • HelloWorld.o 不可執行二進制檔案
      • HelloWorld 可執行二進制檔案
      • 可能會用到的gcc 指令 -g,-masm
        • gcc -masm 指定彙編風格
        • gcc -g 在可執行檔案中加入調試資訊
    • 反彙編工具 objdump
      • MacOS 對objdump的輸出進行優化

編譯基礎 從hello.c到hello可執行檔案的過程

編譯的步驟

可以分為 預處理->編譯->彙編->連接配接階段

預處理:加入頭檔案,替換宏。
編譯:包含預處理,将 C 程式轉換成彙程式設計式。
彙編:包含預處理和編譯,将彙程式設計式轉換成可連結的二進制程式。
連結:包含以上所有操作,将可連結的二進制程式和其它别的庫連結在一起,形成可執行的程式檔案。
           

一步一步編譯

預處理-源檔案生成預處理檔案: 							       gcc -E HelloWorld.c -o HelloWorld.i
編譯器編譯-預處理檔案生成彙編代碼檔案: 					    gcc -S HelloWorld.i -o HelloWorld.s
彙編器編譯-彙編代碼檔案生成不可執行二進制檔案: 		    gcc -c HelloWorld.s -o HelloWorld.o
連結-不可執行二進制檔案生成可執行二進制檔案:          gcc HelloWorld.o -o HelloWorld

說明:不可執行二進制檔案為什麼不可以執行?因為還沒有通過連結器連結
           

指定編譯到某個階段

編譯生成-->預處理檔案:    		gcc -E HelloWorld.c -o HelloWorld.i
編譯到-->彙編代碼檔案:					gcc -S HelloWorld.c -o HelloWorld.s
編譯到-->不可執行檔案	         gcc -c HelloWorld.c -o HelloWorld.o
編譯到-->可執行檔案				    gcc HelloWorld.o -o HelloWorld    生成可執行二進制檔案
           

以下是編譯的圖:

編譯基礎-從hello.c到hello可執行檔案的過程

gcc -E -S -c

-E                      Only run the preprocessor
-S                      Only run preprocess and compilation steps
-c                      Only run preprocess, compile, and assemble steps
           

HelloWorld.i HelloWorld.s HelloWorld.o HelloWorld 每個檔案中内容是什麼?

接下來用下面這段程式HelloWorld.c 做為源檔案

#include "stdio.h"
int main(int argc, char const *argv[])
{
    int a=1;
    int b=2;
    int c=3;
  printf("Hello World!\n");
  return 0;
}
           

HelloWorld.i 預處理檔案

# 1 "HelloWorld.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 361 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "HelloWorld.c" 2

# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/stdio.h" 1 3 4
# 64 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/stdio.h" 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/_stdio.h" 1 3 4
# 68 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/_stdio.h" 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/cdefs.h" 1 3 4
# 608 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/cdefs.h" 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/_symbol_aliasing.h" 1 3 4
# 609 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/cdefs.h" 2 3 4
# 674 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/cdefs.h" 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/_posix_availability.h" 1 3 4
# 675 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/cdefs.h" 2 3 4
# 69 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/_stdio.h" 2 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/Availability.h" 1 3 4
# 242 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/Availability.h" 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/AvailabilityInternal.h" 1 3 4
# 243 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/Availability.h" 2 3 4
# 70 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/_stdio.h" 2 3 4

# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/_types.h" 1 3 4
# 27 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/_types.h" 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/_types.h" 1 3 4
# 33 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/_types.h" 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/machine/_types.h" 1 3 4
# 32 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/machine/_types.h" 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/i386/_types.h" 1 3 4
# 37 "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/i386/_types.h" 3 4
... 省略了很多資訊

__attribute__((__availability__(swift, unavailable, message="Use mkstemp(3) instead.")))

__attribute__((deprecated("This function is provided for compatibility reasons only.  Due to security concerns inherent in the design of tempnam(3), it is highly recommended that you use mkstemp(3) instead.")))

char *tempnam(const char *__dir, const char *__prefix) __asm("_" "tempnam" );

int main(int argc, char const *argv[])
{
    int a=1;
    int b=2;
    int c=3;
  printf("Hello World!\n");
  return 0;
}

           

HelloWorld.s檔案中根據觀察是加入了頭檔案.h資訊

HelloWorld.s 彙編代碼檔案

.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 14	sdk_version 10, 14
	.globl	_main                   ## -- Begin function main
	.p2align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	subq	$32, %rsp
	movl	$0, -4(%rbp)
	movl	%edi, -8(%rbp)
	movq	%rsi, -16(%rbp)
	movl	$1, -20(%rbp)
	movl	$2, -24(%rbp)
	movl	$3, -28(%rbp)
	leaq	L_.str(%rip), %rdi
	movb	$0, %al
	callq	_printf
	xorl	%ecx, %ecx
	movl	%eax, -32(%rbp)         ## 4-byte Spill
	movl	%ecx, %eax
	addq	$32, %rsp
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function
	.section	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
	.asciz	"Hello World!\n"


.subsections_via_symbols

           

這個為ATT格式彙編代碼

HelloWorld.o 不可執行二進制檔案

Offset: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 	
00000000: CF FA ED FE 07 00 00 01 03 00 00 00 01 00 00 00    Ozm~............
00000010: 04 00 00 00 08 02 00 00 00 20 00 00 00 00 00 00    ................
00000020: 19 00 00 00 88 01 00 00 00 00 00 00 00 00 00 00    ................

...

00000300: 00 00 00 00 00 00 00 00 07 00 00 00 01 00 00 00    ................
00000310: 00 00 00 00 00 00 00 00 00 5F 6D 61 69 6E 00 5F    ........._main._
00000320: 70 72 69 6E 74 66 00 00                            printf..

           

這個也就是機器指令,CPU就是讀這個執行指令的

HelloWorld 可執行二進制檔案

Offset: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 	
00000000: CF FA ED FE 07 00 00 01 03 00 00 80 02 00 00 00    Ozm~............
00000010: 0F 00 00 00 C0 04 00 00 85 00 20 00 00 00 00 00    [email protected]
00000020: 19 00 00 00 48 00 00 00 5F 5F 50 41 47 45 5A 45    ....H...__PAGEZE
00000030: 52 4F 00 00 00 00 00 00 00 00 00 00 00 00 00 00    RO..............
....
00001fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00001fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00001ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00002000: 11 22 10 51 00 00 00 00 11 40 64 79 6C 64 5F 73    ."[email protected]_s
00002010: 74 75 62 5F 62 69 6E 64 65 72 00 51 72 00 90 00    tub_binder.Qr...
00002020: 72 10 11 40 5F 70 72 69 6E 74 66 00 90 00 00 00    [email protected]_printf.....
00002030: 00 01 5F 00 05 00 02 5F 6D 68 5F 65 78 65 63 75    .._...._mh_execu
00002040: 74 65 5F 68 65 61 64 65 72 00 21 6D 61 69 6E 00    te_header.!main.
00002050: 25 02 00 00 00 03 00 C0 1E 00 00 00 00 00 00 00    %[email protected]
00002060: C0 1E 00 00 00 00 00 00 02 00 00 00 0F 01 10 00    @...............
00002070: 00 00 00 00 01 00 00 00 16 00 00 00 0F 01 00 00    ................
00002080: 40 0F 00 00 01 00 00 00 1C 00 00 00 01 00 00 01    @...............
00002090: 00 00 00 00 00 00 00 00 24 00 00 00 01 00 00 01    ........$.......
000020a0: 00 00 00 00 00 00 00 00 02 00 00 00 03 00 00 00    ................
000020b0: 00 00 00 40 02 00 00 00 20 00 5F 5F 6D 68 5F 65    [email protected]__mh_e
000020c0: 78 65 63 75 74 65 5F 68 65 61 64 65 72 00 5F 6D    xecute_header._m
000020d0: 61 69 6E 00 5F 70 72 69 6E 74 66 00 64 79 6C 64    ain._printf.dyld
000020e0: 5F 73 74 75 62 5F 62 69 6E 64 65 72 00 00 00 00    _stub_binder....
           

上一個HelloWorld.o的不可執行檔案的最後一個位址為00000320 ,而HelloWorld的可執行檔案的位址為000020e0

顯然可執行檔案是比HelloWorld.o大的,是以HelloWorld的可執行檔案連結了很多庫檔案資訊,是以大的多

好的,到此整個從HelloWorld.c到HelloWorld可執行檔案的過程分析完了,其實還是挺有趣,感覺很充實

接下來我們玩一玩反彙編,

可能會用到的gcc 指令 -g,-masm

這兩個-g,-masm是無意間發現的

gcc -masm 指定彙編風格

$ gcc -S -masm=intel HelloWorld.c -o HelloWorld.s

.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 14	sdk_version 10, 14
	.intel_syntax noprefix
	.globl	_main                   ## -- Begin function main
	.p2align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## %bb.0:
	push	rbp
	.cfi_def_cfa_offset 16
	.cfi_offset rbp, -16
	mov	rbp, rsp
	.cfi_def_cfa_register rbp
	sub	rsp, 32
	mov	dword ptr [rbp - 4], 0
	mov	dword ptr [rbp - 8], edi
	mov	qword ptr [rbp - 16], rsi
	mov	dword ptr [rbp - 20], 1
	mov	dword ptr [rbp - 24], 2
	mov	dword ptr [rbp - 28], 3
	lea	rdi, [rip + L_.str]
	mov	al, 0
	call	_printf
	xor	ecx, ecx
	mov	dword ptr [rbp - 32], eax ## 4-byte Spill
	mov	eax, ecx
	add	rsp, 32
	pop	rbp
	ret
	.cfi_endproc
                                        ## -- End function
	.section	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
	.asciz	"Hello World!\n"


.subsections_via_symbols

           

gcc -g 在可執行檔案中加入調試資訊

softwaredeMacBook-Pro:gcc software$ gcc -c -g HelloWorld.c -o HelloWorld.o
           

反彙編工具 objdump

在MacOS 下objdump很不友好,浪費了我兩個小時時間在這個上,最後把辛酸路程總結在下文,供大家參考

MacOS下的objdump是LLVM平台的,其他windows,Linux的objdump是GUN的

LLVM 平台的objdump文檔位址:https://llvm.org/docs/CommandGuide/llvm-objdump.html

GUN平台的objdump文檔位址:https://sourceware.org/binutils/docs/binutils/objdump.html

首先看一下MacOS下的objdump --version

softwaredeMacBook-Pro:~ software$ objdump --version
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
  Optimized build.
  Default target: x86_64-apple-darwin18.7.0
  Host CPU: skylake

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_be - AArch64 (big endian)
    arm        - ARM
    arm64      - ARM64 (little endian)
    armeb      - ARM (big endian)
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64
           

目前我們可以用objdump把二進制檔案HelloWorld.o(或HelloWorld)反彙編至彙編代碼

softwaredeMacBook-Pro:gcc software$ objdump -d HelloWorld.o

HelloWorld.o:   file format Mach-O 64-bit x86-64

Disassembly of section __TEXT,__text:
_main:
       0:       55      pushq   %rbp
       1:       48 89 e5        movq    %rsp, %rbp
       4:       48 83 ec 20     subq    $32, %rsp
       8:       c7 45 fc 00 00 00 00    movl    $0, -4(%rbp)
       f:       89 7d f8        movl    %edi, -8(%rbp)
      12:       48 89 75 f0     movq    %rsi, -16(%rbp)
      16:       c7 45 ec 01 00 00 00    movl    $1, -20(%rbp)
      1d:       c7 45 e8 02 00 00 00    movl    $2, -24(%rbp)
      24:       c7 45 e4 03 00 00 00    movl    $3, -28(%rbp)
      2b:       48 8d 3d 14 00 00 00    leaq    20(%rip), %rdi
      32:       b0 00   movb    $0, %al
      34:       e8 00 00 00 00  callq   0 <_main+0x39>
      39:       31 c9   xorl    %ecx, %ecx
      3b:       89 45 e0        movl    %eax, -32(%rbp)
      3e:       89 c8   movl    %ecx, %eax
      40:       48 83 c4 20     addq    $32, %rsp
      44:       5d      popq    %rbp
      45:       c3      retq
           

看起是不是很辣眼睛?是的,這就是LLVM.objdump,好的坑已踩好,這時我們就想辦法跳出來

先解釋一下,從左到右:
_main:标号
0,1,4,8:彙編位址
55:機器代碼
pushq: 彙編代碼
           

然後我決定對這個屎一樣輸出進行優化:首先要解決1.彙編風格為Intel,然後解決,2.輸出内容未對齊的檔案

MacOS 對objdump的輸出進行優化

執行以下指令:

objdump -d --no-show-raw-insn -S  -x86-asm-syntax=intel  hello.o 
           

輸出:

hello.o:        file format Mach-O 64-bit x86-64

Disassembly of section __TEXT,__text:
_main:
; {
       0:       push    rbp
       1:       mov     rbp, rsp
       4:       sub     rsp, 32
       8:       mov     dword ptr [rbp - 4], 0
       f:       mov     dword ptr [rbp - 8], edi
      12:       mov     qword ptr [rbp - 16], rsi
; int a=1;
      16:       mov     dword ptr [rbp - 20], 1
; int b=2;
      1d:       mov     dword ptr [rbp - 24], 2
; int c=3;
      24:       mov     dword ptr [rbp - 28], 3
; printf("Hello World!\n");
      2b:       lea     rdi, [rip + 20]
      32:       mov     al, 0
      34:       call    0 <_main+0x39>
      39:       xor     ecx, ecx
; return 0;
      3b:       mov     dword ptr [rbp - 32], eax
      3e:       mov     eax, ecx
      40:       add     rsp, 32
      44:       pop     rbp
      45:       ret
           

這樣是不是就清爽了很多,哈哈

hello.o: file format Mach-O 64-bit x86-64 ,二進制檔案為 mac h-O 64-bit格式的

注意:

gcc hello.c -g -c -o hello.o

要加上-g 把調試資訊放到 hello.o中,這樣objdump才有效

總結完畢,感覺思路更清晰了,離自己寫出作業系統又近了一步:

好的,我的分享到此結束,如果大家對自己動手寫作業系統有興趣,可以通路下面貼的專欄,我們大家一起學習進步:

編譯基礎-從hello.c到hello可執行檔案的過程

繼續閱讀