在平时开发和调试中,经常遇到C调用栈和汇编,所以这里来统一的了解下这部分内容,本章需要一定的汇编基础才能更好的理解。
在JavaScript中,我们定义函数和调用函数都是相当自由的:
function func(a, b, c) {
console.log(a, b, c)
}
func(1)
func(1, 2, 3, 4, 5, 6)
这样做完全没有问题。但是在C语言中,方法调用却是非常严格的,如果参数类型或者个数不对,就会直接编译失败(隐式转换除外)。
int arg1_func(int a) {
return a;
}
int arg2_func(int a, int b) {
return a+b;
}
arg1_func(1, 2);
arg2_func(1);
以上C语言将会直接编译不通过,原因之后再说。这里我们把int(*)(int)
称为这个函数的函数签名
。
为什么我们要了解函数签名
呢?由于C方法的参数传递是和函数签名相关的,而且是编译期就需要确定的。他决定了参数是如何传递给具体方法,并且返回参数是如何返回的。
那么接下来就让我们来了解C语言的参数传递方式。由于不同架构平台拥有不同的处理方式,但大同小异,这里我们就用AArch64
架构来做介绍。
在了解底层之前,我们需要一点ARM的预备知识,这里做一个简单的介绍,具体ARM汇编可以参考官方文档armasm_user_guide和ABI。
In AArch64 state, the following registers are available:
For the purposes of function calls, the general-purpose registers are divided into four groups:
Argument registers (X0-X7)
These are used to pass parameters to a function and to return a result. They can be used as scratch registers or as caller-saved register variables that can hold intermediate values within a function, between calls to other functions. The fact that 8 registers are available for passing parameters reduces the need to spill parameters to the stack when compared with AArch32.
Caller-saved temporary registers (X9-X15)
If the caller requires the values in any of these registers to be preserved across a call to another function, the caller must save the affected registers in its own stack frame. They can be modified by the called subroutine without the need to save and restore them before returning to the caller.
Callee-saved registers (X19-X29)
These registers are saved in the callee frame. They can be modified by the called subroutine as long as they are saved and restored before returning.
Registers with a special purpose (X8, X16-X18, X29, X30)
根据官方文档,这里我们需要知道的是X0-X30个通用寄存器,D0-D31个浮点寄存器,堆栈寄存器SP,和独立不可直接操作的PC寄存器。
其中通用寄存器在C语言的ABI定义中,X29作为栈帧FP,X30作为函数返回地址LR,X0-X7作为参数寄存器,X8为Indirect result location
(和返回值相关),X9-X15为临时寄存器。其他的寄存器和目前我们的内容没有太大的关系,所以不做介绍了。
在阅读以下内容需要明确上述的几个寄存器,特别是LR=X30
,FP=X29
,其中W0和X0代表同一个寄存器,只是W是32位,X是64位。
需要了解的存取指令是LDR(load),STR(store),其他存取指令都是以这两个为基础。相关运算可见ABI 6.3.4节
,这里介绍下下面会遇到的运算:
Example | Description |
---|---|
LDR X0, [X1, #8] |
Load from address X1 + 8 |
LDR X0, [X1, #8]! |
Pre-index: Update X1 first (to X1 + #8), then load from the new address |
LDR X0, [X1], #8 |
Post-index: Load from the unmodified address in X1 first, then update X1 (to X1 + #8) |
在C语言调用过程中,SP
和LR
是成对出现的,他们代表了一个函数的栈区域,也称为栈帧
。
一个栈帧的大概结构如下:
这个结构对我们来说非常重要,也是本次我们讨论的重点。
对于一个函数的调用,入参会放入X0-X7中,返回参数会放在X0中返回,那么我们就来分析下一个简单的例子:
int lessArg(int arg1, char *arg2) {
return 0;
}
调用前:
caller:
0x100791c6c <+20>: mov w9, #0x0
0x100791c70 <+24>: stur w9, [x29, #-0x14]
0x100791c74 <+28>: stur w0, [x29, #-0x18]
0x100791c78 <+32>: str x1, [x8, #0xa0]
0x100791c7c <+36>: mov x1, #0x0 ; // 第二个参数 arg2 = 0
0x100791c80 <+40>: mov x0, x9 ; // 第一个参数 arg1 = 0
0x100791c84 <+44>: str x1, [sp, #0x88]
0x100791c88 <+48>: str x8, [sp, #0x80]
0x100791c8c <+52>: str w9, [sp, #0x7c]
0x100791c90 <+56>: bl 0x100791a60 ; CALL 'lessArg'
cfunction`lessArg:
0x104491a98 <+0>: sub sp, sp, #0x10 ; 由于栈是向下增长的,所以 SP = SP - 0x10
0x104491a9c <+4>: mov w8, #0x0
0x104491aa0 <+8>: str w0, [sp, #0xc]
0x104491aa4 <+12>: str x1, [sp]
0x104491aa8 <+16>: mov x0, x8 ; 返回值 X0 = 0
0x104491aac <+20>: add sp, sp, #0x10 ; 销毁栈
0x104491ab0 <+24>: ret
由以上结果看的确按照ABI所描述的,在<=8个参数的时候,参数是放在寄存器中传递。
那么如果参数超过8个呢?据ABI描述是通过堆栈的形式来传递,我们来看下结果:
int moreArg(int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8, int arg9, int arg10, int arg11, int arg12, int arg13, char *arg14) {
return 0;
}
caller:
0x100791c9c <+68>: mov x1, sp ; x1 = SP
0x100791ca0 <+72>: ldr x30, [sp, #0x88]
0x100791ca4 <+76>: str x30, [x1, #0x18]
0x100791ca8 <+80>: orr w9, wzr, #0xc
0x100791cac <+84>: str w9, [x1, #0x10] ; SP+0x10 = arg13
0x100791cb0 <+88>: mov w9, #0xb
0x100791cb4 <+92>: str w9, [x1, #0xc] ; SP+0xc = arg12
0x100791cb8 <+96>: mov w9, #0xa
0x100791cbc <+100>: str w9, [x1, #0x8] ; SP+0x8 = arg11
0x100791cc0 <+104>: mov w9, #0x9
0x100791cc4 <+108>: str w9, [x1, #0x4] ; SP+0x4 = arg10
0x100791cc8 <+112>: orr w9, wzr, #0x8
0x100791ccc <+116>: str w9, [x1] ; SP = arg9
0x100791cd4 <+124>: orr w2, wzr, #0x2 ; w2 = arg3
0x100791cd8 <+128>: orr w3, wzr, #0x3 ; w3 = arg4
0x100791cdc <+132>: orr w4, wzr, #0x4 ; w4 = arg5
0x100791ce0 <+136>: mov w5, #0x5 ; w5 = arg6
0x100791ce4 <+140>: orr w6, wzr, #0x6 ; w6 = arg7
0x100791ce8 <+144>: orr w7, wzr, #0x7 ; w7 = arg8
0x100791cec <+148>: ldr w10, [sp, #0x7c]
0x100791cf0 <+152>: str w0, [sp, #0x78]
0x100791cf4 <+156>: mov x0, x10 ; w0 = arg1
0x100791cd0 <+120>: orr w9, wzr, #0x1
0x100791cf8 <+160>: mov x1, x9 ; w1 = arg2
0x100791cfc <+164>: str x8, [sp, #0x70]
0x100791d00 <+168>: str w9, [sp, #0x6c]
0x100791d04 <+172>: bl 0x100791a7c ; moreArg at main.mm:16
从上面可以看出来,arg9以上的入参被存在了SP ~ (SP+0x10)
的位置,也就是当前栈的栈底,下一层栈帧的栈顶。
cfunction`moreArg:
0x104491ab4 <+0>: sub sp, sp, #0x40 ; 申请栈空间,这里我们将原来的sp记作'SP0'
; 那么 SP = SP0 - 0x40
0x104491ab8 <+4>: ldr x8, [sp, #0x58]
0x104491abc <+8>: ldr w9, [sp, #0x50] ; w9 = SP + 0x50 = SP0 - 0x40 + 0x50 = SP0 + 0x10
; 也就是w13 = arg13
; 按照这样的推导,下面依次为arg9 ~ arg12
0x104491ac0 <+12>: ldr w10, [sp, #0x4c]
0x104491ac4 <+16>: ldr w11, [sp, #0x48]
0x104491ac8 <+20>: ldr w12, [sp, #0x44]
0x104491acc <+24>: ldr w13, [sp, #0x40] ; w13 = SP + 0x40 = SP0 - 0x40 + 0x40 = SP0
; 也就是w13 = arg9
0x104491ad0 <+28>: mov w14, #0x0
0x104491ad4 <+32>: str w0, [sp, #0x3c]
0x104491ad8 <+36>: str w1, [sp, #0x38]
0x104491adc <+40>: str w2, [sp, #0x34]
0x104491ae0 <+44>: str w3, [sp, #0x30]
0x104491ae4 <+48>: str w4, [sp, #0x2c]
0x104491ae8 <+52>: str w5, [sp, #0x28]
0x104491aec <+56>: str w6, [sp, #0x24]
0x104491af0 <+60>: str w7, [sp, #0x20]
0x104491af4 <+64>: str w13, [sp, #0x1c]
0x104491af8 <+68>: str w12, [sp, #0x18]
0x104491afc <+72>: str w11, [sp, #0x14]
0x104491b00 <+76>: str w10, [sp, #0x10]
0x104491b04 <+80>: str w9, [sp, #0xc]
0x104491b08 <+84>: str x8, [sp]
0x104491b0c <+88>: mov x0, x14
0x104491b10 <+92>: add sp, sp, #0x40 ; =0x40
0x104491b14 <+96>: ret
由此可见,大于8个的参数会被放入栈中SP ~ (SP + count - 8)
,和预期的一样。
上面说了基本类型的传递情况,在C语言中,还有一类不定长数据类型可以直接传递,那就是struct。那么我们来看看struct参数是怎么传递的。
struct SmallStruct {
int arg1;
};
struct SmallStruct smallStructFunc(int arg1, struct SmallStruct arg2) {
struct SmallStruct s = arg2;
return s;
}
caller:
0x100791d24 <+204>: ldur w9, [x29, #-0x30]
0x100791d28 <+208>: mov x1, x9 ; x1 = arg2 !
; 这里struct内容直接赋值给了x1,因为x1的容量完全够用!
0x100791d2c <+212>: ldr w9, [sp, #0x7c]
0x100791d30 <+216>: str w0, [sp, #0x64] ; w0 = arg1
0x100791d34 <+220>: mov x0, x9
0x100791d38 <+224>: bl 0x100791b04 ; smallStructFunc at main.mm:32
cfunction`smallStructFunc:
0x1003b5b04 <+0>: sub sp, sp, #0x20 ; =0x20
0x1003b5b08 <+4>: mov x8, x1 ; x8 = arg2
0x1003b5b0c <+8>: str w8, [sp, #0x10]
0x1003b5b10 <+12>: str w0, [sp, #0xc]
0x1003b5b14 <+16>: ldr w8, [sp, #0x10]
0x1003b5b18 <+20>: str w8, [sp, #0x18]
0x1003b5b1c <+24>: ldr w8, [sp, #0x18]
0x1003b5b20 <+28>: mov x0, x8 ; x0 = x8 = arg2
; 这里直接将x0作为struct返回值
0x1003b5b24 <+32>: add sp, sp, #0x20 ; =0x20
0x1003b5b28 <+36>: ret
可见,小型struct,可以直接放在寄存器中传递,和普通基本类型的传递没有太大的区别。
那么struct足够的大呢,导致不能简单的用寄存器容纳struct的数据?
这里就要涉及到X8的一个特殊身份了(XR, indirect result location),这里我们将X8
记作XR
。
struct BigStruct {
int arg1; int arg2; int arg3; int arg4; int arg5; int arg6; int arg7; int arg8; int arg9; int arg10; int arg11; int arg12; int arg13; char *arg14;
};
struct BigStruct bigStructFunc(int arg1, struct BigStruct arg2) {
struct BigStruct s = arg2;
return s;
}
caller:
0x100791d3c <+228>: mov x9, x0
0x100791d40 <+232>: stur w9, [x29, #-0x38]
0x100791d44 <+236>: ldr x8, [sp, #0x80]
0x100791d48 <+240>: ldur q0, [x8, #0x78]
0x100791d4c <+244>: str q0, [x8, #0x30]
0x100791d50 <+248>: ldur q0, [x8, #0x68]
0x100791d54 <+252>: stur q0, [x29, #-0xa0]
0x100791d58 <+256>: ldur q0, [x8, #0x58]
0x100791d5c <+260>: stur q0, [x29, #-0xb0]
0x100791d60 <+264>: ldur q0, [x8, #0x48]
0x100791d64 <+268>: stur q0, [x29, #-0xc0] ; 以上是将临时变量arg2赋值到Callee的参数栈区
; 这样子函数修改就不会改动原始数据了
; 为方便,后面将已拷贝的数据成为 arg2
0x100791d68 <+272>: add x8, sp, #0xb0 ; XR = SP + 0xb0
; Callee save area
; 这是一个空的区域,用作返回的临时存储区
0x100791d6c <+276>: sub x1, x29, #0xc0 ; x1 = FP - 0xc0 = &arg2
0x100791d70 <+280>: ldr w0, [sp, #0x7c] ; w0 = arg1
0x100791d74 <+284>: bl 0x100791b2c ; bigStructFunc at main.mm:36
cfunction`bigStructFunc:
0x1003b5b2c <+0>: sub sp, sp, #0x20 ; 申请栈空间 SP = SP0 - 0x20
0x1003b5b30 <+4>: stp x29, x30, [sp, #0x10] ; 这里和以上几个不同,是因为这里有函数调用,所以需要把LR和FP压栈
0x1003b5b34 <+8>: add x29, sp, #0x10
0x1003b5b38 <+12>: orr x2, xzr, #0x40 ; struct 的 size = 0x40,作为第三个参数
0x1003b5b3c <+16>: stur w0, [x29, #-0x4]
0x1003b5b40 <+20>: mov x0, x8 ; dst = x0 = XR = SP0 + 0xb0
; 第一个入参dst为caller的临时存储区
; 第二个参数为x1,也就是caller的 &arg2
0x1003b5b44 <+24>: bl 0x1003b62f0 ; symbol stub for: memcpy
; void *memcpy(void *dst, const void *src, size_t n);
; 这里居然直接调用了memcpy,赋值!
0x1003b5b48 <+28>: ldp x29, x30, [sp, #0x10]
0x1003b5b4c <+32>: add sp, sp, #0x20 ; =0x20
0x1003b5b50 <+36>: ret
这样返回值就放在了*XR
所在的位置,caller只需要再拷贝到临时变量区中即可。
可以看到,在处理大型struct时,就会出现多次内存拷贝,会对性能造成一定影响,所以这类方法尽量不要直接传递大型struct,可以传递指针或者引用,或者采用inline的方案,在优化期去除函数调用。
根据AAPCS 64的Parameter Passing Rules
节所述:
If the argument is a Composite Type and the size in double-words of the argument is not more than 8 minus NGRN, then the argument is copied into consecutive general-purpose registers, starting at x[NGRN]. The argument is passed as though it had been loaded into the registers from a double-word- aligned address with an appropriate sequence of LDR instructions loading consecutive registers from memory (the contents of any unused parts of the registers are unspecified by this standard). The NGRN is incremented by the number of registers used. The argument has now been allocated.
大致说的是如果X0-X8中剩余的寄存器足够去保存该结构,那么就保存到寄存器,否则保存到栈。
If the type, T, of the result of a function is such that
void func(T arg)
would require that arg be passed as a value in a register (or set of registers) according to the rules in §5.4 Parameter Passing, then the result is returned in the same registers as would be used for such an argument.
返回值也遵守以上规则。
这个文档不是最新的,而且是beta版,暂时没有找到正式版本。而且这里还涉及到很多其他的因素,所以这里也就不深究了。
相关阅读:C方法的调用参数与ARM汇编(下篇)
本文来自网易实践者社区,经作者段家顺授权发布。