文章目錄
- 現代彙編學習
-
- 彙編語言類型
- 彙編器
-
- 彙編器清單
- 彙編器都幹了什麼
- 連結器
- ld(連結器)
-
- linker script
- 加載器
- 可執行檔案格式
- objconv 工具
- x86彙編
-
- AT&T文法
-
- 函數定義
- AVX,SSE指令
- x64 寄存器
- calling conv
-
- ia32
- 函數調用
-
- stack unwinding
- return value
- nasm
-
- nasm指令
- gas
- Hello world
- armv8彙編
-
- linux 檢視CPU資訊的寄存器
- 常見的名詞
- 寄存器
-
- armv7和aarch32
- aarch64
- calling conv
- 尋址方式
- arm嵌入彙編
-
- 加法例子
- 系統調用
- 庫
-
- x86 再來一遍calling convention
- Linux 常識
本篇文章介紹彙編,連結等常識,同時介紹了x86-64和armv8-a的一些指令文法
現代彙編學習
之前學的
80x86
彙編太古老了,甚至連第一版
linux kernel
代碼都看不懂,現在整理一下一些彙編知識,主要是針對
x86
和
armv8-a
架構的,但是本文并不會教具體的指令長什麼樣,也不會闡述怎麼寫一個通用彙編代碼。學習彙編本身不是我們的終極目的。我們的目的是為了更好開發軟體或者高性能庫,是以會介紹一下編譯,連結的原理和流程。
彙編語言類型
- AT&T
- 是這個實驗室提出的一種文法,注意它和指令集是沒關系的,僅僅是一種文法而已。
- 對于x86處理器,立即數由
,寄存器由$
引用。%
- 對于arm處理器,直接使用arm官方的語言格式。
- intel
- 大學會學到的一種語言格式,x86彙編
- 簡單,易用
- 彙編器的作用是把彙編語言翻譯成機器語言,是以不同的彙編語言可能會使用不同的彙編器。他們的機器語言即指令。
- GAS 彙編器可以彙編x86的彙編語言,也可以彙編arm。
- Intel assembler 隻能彙編 intel的彙編語言,就像icc。
彙編器
彙編器清單
- AT&T assembler - as
- Borland’s Turbo Assembler - TASM
- GNU assembler - gas,GCC預設使用這個
- Intel Assembler
- Microsoft Assembler - MASM
- Netwide Assembler - NASM
- Yet Another Assembler - YASM
x86 gas文法
彙編器簡短比較
彙編器都幹了什麼
- 彙編器負責把目前源檔案的所有變量,常量,宏,
解析等等并且生成相應的符号表(這時候的解析是初步的,符号表是不完整的,後續工作連結器會接手),所謂解析就是确定他們的位址。這些位址都是基于本子產品計算出來的相對位址,絕對位址的計算需要到連結階段才能做。label
- 彙編器的輸出是目标檔案
Assembling:
- Assembling converts source program into object program if syntactically correct and
generates an intermediate .obj file or module.
- It calculates the offset address for every data item in data segment and every
instruction in code segment.
- A header is created which contains the incomplete address in front of the generated obj
module during the assembling.
- Assembler complains about the syntax error if any and does not generate the object
module.
- Assembler creates .obj .lst and .crf files and last two are optional files that can be
created at run time.
- For short programs, assembling can be done manually where the programmer translates
each mnemonic into the machine language using lookup table.
- Assembler reads each assembly instruction of a program as ASCII character and
translates them into respective machine code.
- 符号表實際上就是用于連結器定位每個目标檔案的變量和函數資訊比如彙編中重要的三個節
.bss .data .text
- 在彙編語言源代碼可以調用外部檔案的函數,就像C語言調用庫函數一樣。彙編器對這些函數或者變量無法進行位址計算。這些活由連結器幹。
-
c 語言中 加了static的函數不會出現在 global section
- 目标檔案的三種類型(來自CSAPP:chapter 7)
- Relocatable object file. Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file.
- Executable object file. Contains binary code and data in a form that can be copied directly into memory and executed.
- Shared object file. A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run time.
-
工具可以檢視某個目标檔案的符号(nm
都可以看符号表)readelf和objdump
-
nm -gD yourLib.so objdump -TC 也可以(C用于c++) readelf -Ws
-
- 一個簡單的例子:
...
int x = 0;
int y = -1;
y = x;
...
上面這個片段定義了兩個全局标号
x y
,這兩個标号實際上是一個位址,但是我們在C語言引用的時候實際上是引用值而不是位址。隻有
y = &x
這個語句的含義才是将
x
這個标号的值(即
x
的位址)賦給
y
y = x
這個語句實際上會被編譯成
mov eax,[x]
mov dword[y],eax
連結器
- 連結器負責進一步解析所有的符号,函數名,(根據每個目标檔案的符号表)把所有的目标檔案連結成一個最終的可執行檔案。連結器收集所有目标檔案的符号表資訊,合并同類的段(比如不同目标檔案的代碼段,資料段)等工作。最終可執行檔案的符号表是完整的。
- 比如兩個.o檔案都定義了函數
,這時候連結器需要判斷哪個是需要被采用的void foo()
- 在.o檔案中沒有在
中的函數是無法被其他.o檔案調用的,實際上在彙編階段不會出現在符号表,是以連結器也無法使用。(這些函數就是在global
中加了c
修飾符的變量)static
- 比如兩個.o檔案都定義了函數
This involves the converting of .OBJ module into .EXE(executable) module i.e.
executable machine code.
- It completes the address left by the assembler.
Microprocessors lecture 5: Programming with 8086 Microprocessor
- It combines separately assembled object files.
- Linking creates .EXE, .LIB, .MAP files among which last two are optional files.
連結器負責把全部的符号解析成最終的位址。它負責解析所有的符号。
-
即一個連結器ld
- 指令
很有用readelf
- 連結
-
static linking
現代彙編語言現代彙編學習objconv 工具x86彙編nasmarmv8彙編系統調用庫Linux 常識 -
dynamic linking
-
runtime linking
參考1
quaro參考
-
- 更詳細的參考另外的md檔案
ld(連結器)
- 如果希望相容32位的elf輸出,這裡需要使用
選項-m
ld -m elf_i386 -s -o file file.o
linker script
連結腳本是連結器使用的。腳本中會指定某些預設行為,比預設連結的一些标準庫,比如可執行檔案的入口位址。
加載器
- loader通常由shell程序負責喚起,它将可執行檔案的代碼加載到記憶體。
It Loads the program in memory for execution.
- It resolves remaining address.
- This process creates the program segment prefix (PSP) before loading.
- It executes to generate the result.
可執行檔案格式
- 常用的有ELF:executable and linkable format
bin flat-form binary files (e.g. DOS .COM, .SYS) ith Intel hex srec Motorola S-records aout Linux a.out object files aoutb NetBSD/FreeBSD a.out object files coff COFF (i386) object files (e.g. DJGPP for DOS) elf32 ELF32 (i386) object files (e.g. Linux) elf64 ELF64 (x86_64) object files (e.g. Linux) elfx32 ELFX32 (x86_64) object files (e.g. Linux) as86 Linux as86 (bin86 version 0.3) object files obj MS-DOS 16-bit/32-bit OMF object files win32 Microsoft Win32 (i386) object files win64 Microsoft Win64 (x86-64) object files rdf Relocatable Dynamic Object File Format v2.0 ieee IEEE-695 (LADsoft variant) object file format macho32 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (i386) object files macho64 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (x86_64) object files dbg Trace of all info passed to output stage elf ELF (short name for ELF32) macho MACHO (short name for MACHO32) win WIN (short name for WIN32)
objconv 工具
-
把目标檔案反彙編成指定格式的彙編語言objconv -fnasm main.o
x86彙編
-
,把符号設為具體的值.set symbol,expression
AT&T文法
- 指令可能會加字尾表示操作數的寬度
- “byte” refers to a one-byte integer (suffix b)
- “word” refers to a two-byte integer (suffix w),
- “doubleword” refers to a four-byte integer (suffix l), and
- “quadword” refers to an eight-byte value (suffix q).
- MOV:
.(32,16,8,64)一般如果沒有字尾那就是預設的長度。movl,movw,movb,movq
mov 基本指令
如果用到擴充,mov還會有s(signed),z(zero)的字尾,
movzbl %al,%ebx 表示對al寄存器(8-bit)進行zero擴充放到ebx(64-bit)
mov 記憶體尋址方式segment:displacement(base register, index register, scale factor)
即segment:[base register + displacement + index register * scale factor]
movzwl (%rdx,%rax,2),%edx
函數定義
-
表示将符号start定義成一個函數标号.type start, @function
AVX,SSE指令
指令參考1
x64 寄存器
- There are sixteen 64-bit registers in x86-64: %rax, %rbx, %rcx, %rdx, %rdi, %rsi, %rbp, %rsp, and %r8-r15. Of these, %rax, %rcx, %rdx, %rdi, %rsi, %rsp, and %r8-r11 are considered caller-save registers, meaning that they are not necessarily saved across function calls. By convention, %rax is used to store a function’s return value, if it exists and is no more than 64 bits long. (Larger return types like structs are returned using the stack.) Registers %rbx, %rbp, and %r12-r15 are callee-save registers, meaning that they are saved across function calls.Register**%rsp **is used as the stackpointer, a pointer to the top most element in the stack.
calling conv
- X64 和 ia32的參數傳遞有所不同,後者使用棧(系統調用會使用寄存器),前者會使用寄存器以及棧
- windows和linux參數傳遞也不同
###x64
- linux和OS X: %rdi, %rsi, %rdx, %rcx, %r8, and %r9 are used to pass the first six integer or pointer parameters to called functions. Additional parameters (or large parameters such as structs passed by value) are passed on the stack.
;Example function call: extern putchar mov rdi,'H' ; function parameter: one char to print call putchar
- windows則不同
- Win64 function parameters go in registers rcx, rdx, r8, and r9.
- Win64 functions assume you’ve allocated 32 bytes of stack space to store the four parameter registers, plus another 8 bytes to align the stack to a 16-byte boundary.
sub rsp,32+8; parameter area, and stack alignment extern putchar mov rcx,'H' ; function parameter: one char to print call putchar add rsp,32+8 ; clean up stack
- Win64 treats the registers rdi and rsi as preserved.
ia32
- 對于常見的C++程式,像預設_cdecl或使用_stdcall的函數壓棧順序都是采用的從右往左壓棧的
void fun(int a, int b) 則b先入棧,a後入棧
-
In 32-bit x86, the base pointer (formerly %ebp, now %rbp) was used to keep track of the base of the current stack frame, and a called function would save the base pointer of its caller prior to updating the base pointer to its own stack frame. With the advent of the 64-bit architecture, this has been mostly eliminated, save for a few special cases when the compiler cannot determine ahead of time how much stack space needs to be allocated for a particular function (see Dynamic stack allocation).
in 32 bit mode, parameters are passed by pushing them onto the stack in reverse order, so the function’s first parameter is on top of the stack before making the call. In 32-bit mode Windows and OS X compilers also seem to add an underscore before the name of a user-defined function, so if you call a function foo from C/C++, you need to define it in assembly as “_foo”.
- “Scratch” registers you’re allowed to overwrite and use for anything you want,“Preserved” registers serve some important purpose somewhere else, so you have to put them back (“save” the register) if you use them.
函數調用
- 注意參數的壓棧動作是在caller 程式裡面的,這些都是一些convention,不是硬體實作的,是由軟體 即程式員實作的。
- 發生函數調用時,caller按照convention, 将
(即scratch register)壓入棧,然後負責壓參數 , 調用call指令,call指令會自動push傳回位址。jmp到callee内部,第一件事是儲存ebp到棧頂, 然後callee負責将caller saved
寄存器壓入棧。退出的時候是調用者負責清理參數callee saved
一個棧幀看起來應當是
當然,在
x64
裡,可能沒有參數入棧。如果使用寄存器傳參,在子程式内應當将這些寄存器的值先入棧,否則後面沒法使用這些值
下圖展示
armv8
函數的參數入棧順序
void test_para(int a0, int a1, int a2, int a3, int a4,int a5,int a6, int a7,int a8,int a9,int a10,int a11, a12);
//調用
test_para(0,1,2,3,4,5,6,7,8,9,10,11,12);
stack unwinding
stkoverflow
return value
- x86的傳回值通常使用
RDX:RAX
c struct傳回值
nasm
- nasm由幾部分構成(layout)
- 指令(機器語言助記符)
- 僞指令:
-
,DB
,DW
,DD
,DQ
,DT
,DO
andDY
,DZ
-
,RESB
,RESW
,RESD
,RESQ
,REST
,RESO
andRESY
RESZ
-
: 包含一個二進制可執行檔案INCBIN
incbin "file.dat" ; include the whole file incbin "file.dat",1024 ; skip the first 1024 bytes incbin "file.dat",1024,512 ; skip the first 1024, and actually include at most 512
-
: 定義常量。注意EQU
實際上等價于C中的宏,在指令中可以認為是立即數,經過彙編後就沒有了(被數值替代),而equ
是在記憶體中配置設定了一個位元組,在指令中認為是一個位址,也就是在執行過程中使用位址進行通路。下列定義中message實際上是一個位址(指針)db
message db 'hello, world' msglen equ $-message
-
:用于重複指令或者資料TIMES
zerobuf: times 64 db 0 ;定義64個byte 全都等于0 buffer: db 'hello, world' times 64-$+buffer db ' ' ;;;實際上buffer和64-$+buffer都隻是位址,标号之間可以做數學運算, ;;;times有點像是循環這裡因為times僞指令$ ;;;這裡在buffer後面生命了64個byte ;;;等價于64條語句 db ' ' times 100 resb 1;;;timesk可以搭配指令,效果等價于resb 100,實際上它是重複執行100次指令resb 1
-
- Nasm的特點
- 以點
開頭的标号是本地标号leading dot is NASM’s syntax for making local labels.
- 尋址方式寫成
在gcc下是mov [ebx + eax * 4 + 1],1
mov 1,1(ebx,eax,4)
- 以點
- nasm對于x86幾乎是原生格式(intel syntax)
- 彙編語句(layout):
-
,其中[label] mnemonic [operands] [;comment]
包含兩部分,assembler directives and instructionsmnemonic
- 下圖是64-bit的代碼劃分
現代彙編語言現代彙編學習objconv 工具x86彙編nasmarmv8彙編系統調用庫Linux 常識 -
- section定義 以
開頭的section是預定義好的,其屬性是已經定義好的。.
- 資料定義:這部分出現在
或者.bss
節data
-
則定義了一個字元串,在指令中就可以使用message: db "Hello, World", 10
作為操作數了messege
- 可以在
節預留白間不初始化使用.bss
指令res
buffer: resb 64 ; reserve 64 bytes wordvar: resw 1 ; reserve a word realarray: resq 10 ; array of ten reals
-
db 0x55 ; just the byte 0x55
db 0x55,0x56,0x57 ; three bytes in succession
db 'a',0x55 ; character constants are OK
db 'hello',13,10,'$' ; so are string constants
dw 0x1234 ; 0x34 0x12
dw 'a' ; 0x61 0x00 (it's just a number)
dw 'ab' ; 0x61 0x62 (character constant)
dw 'abc' ; 0x61 0x62 0x63 0x00 (string)
dd 0x12345678 ; 0x78 0x56 0x34 0x12
dd 1.234567e20 ; floating-point constant
dq 0x123456789abcdef0 ; eight byte constant
dq 1.234567e20 ; double-precision float
dt 1.234567e20 ; extended-precision float
- 常數寫法
字尾
H
or
X
,
D
or
T
,
Q
or
O
, and
B
or
Y
分别是 hexadecimal, decimal, octal and binary respectively,
16進制字首
0x
or
$0
mov ax,200 ; decimal
mov ax,0200 ; still decimal
mov ax,0200d ; explicitly decimal
mov ax,0d200 ; also decimal
mov ax,0c8h ; hex
mov ax,$0c8 ; hex again: the 0 is required
mov ax,0xc8 ; hex yet again
mov ax,0hc8 ; still hex
mov ax,310q ; octal
mov ax,310o ; octal again
mov ax,0o310 ; octal yet again
mov ax,0q310 ; octal yet again
mov ax,11001000b ; binary
mov ax,1100_1000b ; same binary constant
mov ax,1100_1000y ; same binary constant once more
mov ax,0b1100_1000 ; same binary constant yet again
mov ax,0y1100_1000 ; same binary constant yet again
nasm指令
-
檢視彙編器的支援的輸出檔案格式nasm -hf
- 彙編時,使用
指定輸出檔案格式-f
gas
- gas directives
64bit很好的學習網站
Hello world
- 32-bit
section .text
global _start
;;; write(1, msg, len);
;;; write的系統調用号是4
;;; 參數是edx,ecx,ebx
;;; eax存放系統調用号
_start :
mov eax,4
mov edx,len
mov ecx,msg
mov ebx,1
int 0x80
mov eax,1
int 0x80
section .rodata
msg db 'Hello world',0xa;
len equ $ - msg ; msg的長度
nasm -f elf32 hello.as,m -o hello.o && ld -m elf_i386 -o hello
- 64-bit
global _start
section .text
_start: mov rax, 1 ; system call for write
mov rdi, 1 ; file handle 1 is stdout
mov rsi, message ; address of string to output
mov rdx, 13 ; number of bytes
syscall ; invoke operating system to do the write
mov rax, 60 ; system call for exit
xor rdi, rdi ; exit code 0
syscall ; invoke operating system to exit
section .data
message: db "Hello, World", 10 ; note the newline at the end
😜
armv8彙編
linux 檢視CPU資訊的寄存器
Vendor Name | Vendor ID |
---|---|
ARM | 0x41 |
Broadcom | 0x42 |
Cavium | 0x43 |
DigitalEquipment | 0x44 |
HiSilicon | 0x48 |
Infineon | 0x49 |
Freescale | 0x4D |
NVIDIA | 0x4E |
APM | 0x50 |
Qualcomm | 0x51 |
Marvell | 0x56 |
Intel | 0x69 |
資訊存儲在寄存器MIDR_EL1
其中從低至高第0-3 bit表示revision,代表固件版本的小版本号,如r1p3中的p3;
第4-15 bit表示part number(id),代表這款CPU在所在vendor産品中定義的産品代碼,如在HiSilicon産品中,part_id=0xd01代表Kunpeng-920晶片;
第16-19 bit表示architecture,即架構版本,0x8即ARMv8;
第20-23 bit表示variant,即固件版本的大版本号,如r1p3中的r1;
第24-31 bit表示implementer,即vendor id,如vendor_id=0x48表示HiSilicon
檔案夾/sys/devices/system/cpu裡面有詳細的每個核的資訊
寄存器的值如果是
0x00000000481fd010 海思供應商
- arm一個字是32位(single),intel的一個字是16位(word)
- 位元組(B),半字(H),字(S),雙字(D)。
- armv8支援64位的指令(A64),同時支援aarch64和aarch32兩種執行狀态
-
aarch64是armv8的一種執行狀态,aarch32是為了相容armv7,它是32位指令(A32)的超集。
armv8-a中的aarch32。從aarch32的寄存器和aarch64之間的寄存器必須有一個映射關系,就像寫ia32和
X86_64的
,ax
,rax
一樣(不同模式不同的映射關系)eax
- 無論aarch32以及aarch64,實際上都是在說armv8-a體系結構
- 問題來了:一個處理器怎麼決定自己是aarch64還是aarch32
AArch64 or ARM64 is the 64-bit extension of the ARM architecture.
It was first introduced with the ARMv8-A architecture.
常見的名詞
ARM:
- AArch64: AArch64 is the 64-bit execution state of the ARMv8 ISA, A machine in this state executes operates on the A64 instruction set
- AArch32:32位的運作模式,是ARMv8-a向前相容的。也就是ARMv8-a有兩種執行模式
- A64:指令集
- A32:32位指令集
- ARMv8:體系結構。通常armv8指的是armv8-a.Cortex-A32 是 32-bit ARMv8-A 的CPU,大部分 ARMv8-A都是支援64-bit的。除了armv8-a,實際上還有armv8-r系列,不過這些都是32位的。
- ISA:In computer science, an instruction set architecture (ISA) is an abstract model of a computer. It is also referred to as architecture or computer architecture. A realization of an ISA, such as a central processing unit (CPU), is called an implementation
- 微體系架構(microarchitecture)是ISA的一個實作
- 比方說實體寄存器有多少個,幾發射,cache 一緻性協定是啥,保留站項數等等
寄存器
armv7和aarch32
- 16個32位通用寄存器 (R0-R15).隻用r0-13能使用
- r14是傳回位址,别名
,link register(lr)
- r15是pc,可以使用
用來子程式傳回mov pc,lr
- Arm v7是沒有sp寄存器的,使用r13代替
- r14是傳回位址,别名
- 向量寄存器:32 個 64-bit 的 (D0-D31),或者16個128位的(Q0-Q15).
現代彙編語言現代彙編學習objconv 工具x86彙編nasmarmv8彙編系統調用庫Linux 常識
aarch64
- 31 個 64-bit 通用寄存器(X0-X30) 和1個 特殊的寄存器.也可以使用32位模式去通路(W0-W30)
- x29是frame pointer(在x86裡是EBP)
- x30是LR
- 和armv7不一樣,aarch64有專門的pc寄存器且不允許直接通路pc
- 32 個 128-bit 向量寄存器 (V0-V31). These registers can also be viewed as 32-bit Sn registers or 64-bit Dn registers.
- aarch64通用寄存器清單
Register Role Requirement X0 - X7 Parameter/result registers Can Corrupt X8 Indirect result location register X9 - X15 Temporary registers X16 - X17 Intra-procedure call temporary X18 Platform register, otherwise temporary X19 - X29 Callee-saved register Must preserve X30 Link Register(函數調用的傳回位址) Can Corrupt - 通用寄存器映射(mode表示不同的異常狀态)
calling conv
arm calling convention
- armv8的參數寄存器是
,x64的參數寄存器是r0-r7
額外的參數都是在棧上(從右邊開始壓)rdi,rsi,rdx,rcx,r8,r9
- x8是用來接收傳回值的(存放傳回位址), 相比之下x86-64用于存放傳回值位址的是
寄存器rax
尋址方式
arm尋址方式,32位通用寄存器是是R,64是X
其中
base
是寄存器或者是
SP
- 注意:sp必須是16位元組對齊的
- arm沒有push,pop指令,全都post/pre-index的str和ld
- 從記憶體加載的資料的寬度由指令字尾指定
-
signed byte extendldrsb =>
-
- 加載到寄存器的寬度由寄存器的名字決定
-
目的寄存器是32位,從記憶體加載一個byte,高位使用signed extend擴充LDRSB W4, <addr>
-
uxtb, sxtb, uxth, sxth, uxtw, sxtw
Extending operators main purpose is to widen a narrower value found in a register to match the number of bits for the operation. An extending operator is of the form kxtw, where k is the kind of integer we want to widen and w is the width of the narrow value. For the former, the kind of integer can be U (unsigned) or S (signed, i.e. two’s complement). For the latter the width can be B, H or W which means respectively byte (least 8 significant bits of the register), half-word (least 16 significant bits of the register) or word (least significant 32 bits of the register).
add x0, x1, w2, sxtw // x0 ← x1 + ExtendSigned32To64(w2)
add x0, x1, w2, sxtb // x0 ← x1 + ExtendSigned8To64(w2)
add w0, w1, w2, sxtb // w0 ← w1 + ExtendSigned8To32(w2)
In both cases the least significant 8 bits of w2 are extended but in the first case they are extended to 64 bit and in the second case to 32-bit. Extension and shift
It is possible to extend a value and then shift it left 1, 2, 3 or 4 bits by specifying an amount after the extension operator. For instance
mov x0, #0 // x0 ← 0 mov x1, #0x1234 // x0 ← 0x1234
add x2, x0, x1, sxtw #1 // x2 ← x0 + (ExtendSigned16To64(x1) << 1) // this sets x2 to 0x2468
add x2, x0, x1, sxtw #2 // x2 ← x0 + (ExtendSigned16To64(x1) << 2) // this sets x2 to 0x48d0
add x2, x0, x1, sxtw #3 // x2 ← x0 + (ExtendSigned16To64(x1) << 3) // this sets x2 to 0x91a0
add x2, x0, x1, sxtw #4 // x2 ← x0 + (ExtendSigned16To64(x1) << 4) // this sets x2 to 0x12340
This may seem a bit odd and arbitrary at this point but in later chapters we will see that this is actually useful in many cases.
This is all for today.
反彙編一個例子
- nop的機器碼
- 上述機器是小端,即高位元組在高位址。2d8是位址從左到右依次遞減,即
是從高位址開始的。d503201f
- 這個反彙編結果和x86的不太一樣。。。x86的 arm的
現代彙編語言現代彙編學習objconv 工具x86彙編nasmarmv8彙編系統調用庫Linux 常識
arm嵌入彙編
cookbook
- 文法
其中code_template一條指令用引号括起來的asm(code_template :output_operand_list :input_operand_list :clobbered_register_list)
asm ( "TST LR, #0x40\n\t" "BEQ from_nonsecure\n\t" "from_secure:\n\t" "TST LR, #0x04\n\t" "ITE EQ\n\t" "MRSEQ R0, MSP\n\t" "MRSNE R0, PSP\n\t" "B hard_fault_handler_c\n\t" "from_nonsecure:\n\t" "MRS R0, CONTROL_NS\n\t" "TST R0, #2\n\t" "ITE EQ\n\t" "MRSEQ R0, MSP_NS\n\t" "MRSNE R0, PSP_NS\n\t" "B hard_fault_handler_c\n\t" );
加法例子
#include <stdio.h>
int add(int i, int j)
{
int res = 0;
__asm ("ADD %[result], %[input_i], %[input_j]"
: [result] "=r" (res)
: [input_i] "r" (i), [input_j] "r" (j)
);
return res;
}
int main(void)
{
int a = 1;
int b = 2;
int c = 0;
c = add(a,b);
printf("Result of %d + %d = %d\n", a, b, c);
}
- 使用通用寄存器做整數加減法
系統調用
- 系統調用号:the syscall call numbers for the 32-bit ABI are in
(same contents in/usr/include/i386-linux-gnu/asm/unistd_32.h
)./usr/include/x86_64-linux-gnu/asm/unistd_32.h
- 注意系統調用怎麼傳參數的。(ia32和x64不太一樣)
- 32位(i386)
-
系統調用号eax
-
系統調用參數ebx ecx...
-
存放結果eax
- There are six registers that stores the arguments of the system call used. These are the EBX, ECX, EDX, ESI, EDI, and EBP. These registers take the consecutive arguments, starting with the EBX register. If there are more than six arguments then the memory location of the first argument is stored in the EBX register.
-
- 32位(i386)
-
用來放系統調用号eax
-
指令可以用來跟蹤執行的系統調用strace
-
的系統調用使用指令x64
而不是syscall
int 0x86
庫
- 有兩種庫,靜态庫和動态庫。靜态庫和動态庫都是一些目标檔案(即經過彙編後的檔案)
-
靜态庫實際上就是連結時候把這個庫全部代碼內建到調用庫的這個主檔案。
靜态庫自然是靜态連結的(實際上運作時候我們不再依賴于那個庫檔案,整個可執行檔案已經包含了全部的庫代碼)
- 動态庫是位置無關代碼。在連結時連結器不內建這些代碼,隻是在符号表中記錄調用的函數在哪個庫裡面有。運作時再從這個符号表裡面去找這個函數,到時候再轉移即可。這有點像系統調用。
-
- 名詞解釋
-
:dynamic shared object/dynamic link libraryDSO/DLL
-
- 可能用到的工具:
nm ldd objconv readelf
- 參考
x86 再來一遍calling convention
和系統調用相關調用形式:Interfacing with operating system libraries requires knowing how to pass parameters and manage the stack. These details on a platform are called a calling convention.
- Caller-saved 意思是被調程式允許随意使用這些寄存器,如果主程式在調用之前在使用這些寄存器,調用之後想繼續使用,那主程式需要自己儲存這些寄存器。這些寄存器就是普通的寄存器而已,儲存與否取決于主程式的意願(顯然這是由caller儲存)。
- Callee-saved意思需要跨越調用儲存的寄存器的值,比如常見的
這些寄存器是有其他用途的,随時都必須保持在目前上下文時處于正确的值。這類寄存器必須要在使用之間儲存(顯然這是callee要使用的,應當由callee儲存)rsp ,rbp
Caller-saved registers (AKA volatile registers, or call-clobbered) are used to hold temporary quantities that need not be preserved across calls.
For that reason, it is the caller’s responsibility to push these registers onto the stack or copy them somewhere else if it wants to restore this value after a procedure call.
It’s normal to let a
call
destroy temporary values in these registers, though.
Callee-saved registers (AKA non-volatile registers, or call-preserved) are used to hold long-lived values that should be preserved across calls.
- microsoft x64 calling convention
- wikipedia x86 calling convention
- agner pdf
Linux 常識
- Linux通常指的是linus Torvalds寫的kernel,一般的linux os指的是linux distribution,比如ubuntu, debian,fedora。而GNU實際上是 richard stallman給他的project起的名字,做了很多os的軟體(比如
),但是沒有kernel。可以粗糙了解gcc
linux os = linux kernel + gnu軟體
- System V 是at&t 開發的, 基于unix
- BSD是
不僅是kernel,而且是整個os。Berkeley Software Distribution
- FreeBSD: FreeBSD is the most popular BSD, aiming for high performance and ease of use. It works well on standard Intel and AMD 32-bit and 64-bit processors.
- NetBSD: NetBSD is designed to run on almost anything and supports many more architectures. The motto on their homepage is, “Of course it runs NetBSD.”
- OpenBSD: OpenBSD is designed for maximum security — not just with its features, but with its implementation practices. It’s designed to be an operating system banks and other serious institutions would use for critical systems.
- DragonFly BSD: DragonFly BSD was created with the design goal of providing an operating system that would run well in multithreaded environments — for example, in clusters of multiple computers.
- Darwin / Mac OS X: Mac OS X is actually based on the Darwin operating system, which is based on BSD. It’s a bit different from other BSDs. While the low-level kernel and other software is open-source BSD code, most of the rest of the operating system is closed-source Mac OS code. Apple built Mac OS X and iOS on top of BSD so they wouldn’t have to write the low-level operating system themselves, just as Google built Android on top of Linux
- 是以可以看到,MacOS實際上是基于BSD的,是以常用的軟體
等和Linux os都有所差別編譯器 sed