4-OS-Low level Programming

Mechanism for app to interact with the OS -Similar to function calls -Code securely implemented in the OS -Follows predefined interface --Called “ABI” – Application Binary Interface --Functions referenced by predefined number Example syscall triggers -“trap” instruction -Special “syscall” instruction -Forced memory exception
展开查看详情

1.Low level Programming

2.Linux ABI System Calls Everything distills into a system call /sys, / dev , / proc  read() & write() syscalls What is a system call? Special purpose function call Elevates privilege Executes function in kernel But what is a function call?

3.Mechanism for app to interact with the OS Similar to function calls Code securely implemented in the OS Follows predefined interface Called “ABI” – Application Binary Interface Functions referenced by predefined number Example syscall triggers “ trap” instruction Special “ syscall ” instruction Forced memory exception

4.Examples getpid () Return process’s ID Function 39 in 64-bit Linux, 20 in FreeBSD brk () Return current “top” of heap Function 12 in 64-bit Linux, 69 in FreeBSD

5.What is a function call? Special form of jmp Execute a block of code at a given address Special instruction: call < fn -address> Why not just use jmp ? What do function calls need? int foo( int arg1, char * arg2); Location: foo() Arguments: arg1, arg2, … Return code: int Must be implemented at hardware level

6.Hardware implementation 0000000000000107 <foo>: 107: 55 push % rbp 108: 48 89 e5 mov % rsp ,% rbp 10b: 89 7d fc mov %edi,-0x4(% rbp ) 10e: 48 89 75 f0 mov %rsi,-0x10(% rbp ) 112: b8 00 00 00 00 mov $0x0,%eax 117: c9 leaveq 118: c3 retq Location Address of function + ret instruction Arguments Passed in registers (which ones? And why those?) Return code Stored in register: EAX To understand this we need to know about assembly programming… i nt foo( int arg1, char * arg2) { return 0; }

7.Assembly basics What makes up assembly code? Instructions Architecture specific Operands Registers Memory (specified as an address) Immediates Conventions Rules of the road and/or behavior models

8.Registers General purpose 16bit: AX, BX, CX, DX, SI, DI 32 bit: EAX, EBX, ECX, EDX, ESI, EDI 64 bit: RAX, RBX, RCX, RDX, RSI, RDI + others Environmental RSP, RIP RBP = frame pointer, defines local scope Special uses Calling conventions RAX == return code RDI, RSI, RDX, RCX… == ordered arguments Hardware defined Some instructions implicitly use specific registers RSI/RDI  String instructions RBP  leaveq

9.Memory X86 provides complex memory addressing capabilities Immediate addressing mov % rsi , ($0xfff000) Direct addressing mov % rsi , (% rbp ) Offset Addressing m ov % rsi , $0x8(% rax ) Base + (Index * Scale) + Displacement A.K.A. SIB Occasionally seen Hardly ever used by hand m ovl % ebp , (%rdi,%rsi,4) Address = rdi + rsi * 4 A more complicated example segment:disp (base, index, scale)

10.8/16/32/64 bit operands Programmer explicitly specifies operand length in operand Example: mov reg , reg 8 bits: movb %al, % bl 16 bits: movw %ax, % bx 32 bits: movl % eax , % ebx 64 bits: movq % rax , % rbx What about “ movl % ebx , (% rdi ) ”?

11.Function call implementation 0000000000000107 <foo>: 107: 55 push % rbp 108: 48 89 e5 mov % rsp ,% rbp 10b: 89 7d fc mov %edi,-0x4(% rbp ) 10e: 48 89 75 f0 mov %rsi,-0x10(% rbp ) 112: b8 00 00 00 00 mov $0x0,%eax 117: c9 leaveq 118: c3 retq Location Address of function + ret instruction Arguments Passed in registers (which ones? And why those?) Return code Stored in register: EAX We can now decode what is going on here i nt foo( int arg1, char * arg2) { return 0; }

12.OS development requires assembly programming OS operations are not typically expressible with a higher level language Examples: atomic operations, page table management, configuring segments, System calls(!) How to mix assembly with OS code (in C) Compile with assembler and link with C code .S files compiled with gas Inline w/ compiler support .c files compiled with gcc

13.Implementing a ssembler functions C functions: Location, args , return code ASM functions: Location only Programmer must implement everything else Arguments, context, return values Everything in foo() from before + function body Programmer takes place of compiler Must match calling conventions

14.Calling assembler functions Programmer implements calling convention Behaves just like a regular function Only need location Linker takes care of the rest . globl foo foo: push % rbp mov % rsp , % rbp … Defines a global variable extern int foo( int , char *); i nt main() { int x = foo(1, “test”); } f oo.S main.c

15.Inline OS only needs a few full blown assembly functions Context switches, interrupt handling, a few others Most of the time just need to execute a single instruction i.e. set a bit in this control register GCC provides ability to incorporate inline assembly instructions into a regular .c file Not a function Compiler handles argument marshaling

16.Overview Inline assembly includes 2 components Assembly code Compiler directives for operand marshaling asm ( assembler template : output operands /* optional */ : input operands /* optional */ : list of clobbered registers /* optional */ );

17.Inline assembly execution Sequence of individual assembly instructions Can execute any hardware instruction Can reference any register or memory location Can reference specified variables in C code 3 Stages of execution Load C variables into correct registers or memory Execute assembly instructions Copy register and memory contents into C variables

18.Specifying inline operands How does compiler copy C variables to/from registers? C variables and registers are explicitly linked in asm specification Sections for input and output operands Compiler handles copying to and from variables before and after assembly executed Assembly code references marshaled values (index of operand) instead of raw registers

19.Operand Codes Wide range of operand codes (“constraints”) are available Input: “ code ”(c- variable ) Output: “= code ”(c- variable ) a = % rax , % eax , % ax b = % rbx , % ebx , % bx c = % rcx , % ecx , % cx d = % rdx , % edx , % dx S = % rsi , % esi, % si D = %rdi , % edi , %di r = Any register q = a, b, c, d regs m = memory operand f = floating point reg i = immediate g = anything Explicit Register codes Other Operand codes And many more….

20.Register example i nt foo( int arg1, char * arg2) { int a=10, b ; asm (" movl %1, %% ecx ;

21.Memory example X86 can also use memory (SIB, etc ) operands “m” operand code 0000000000000107 <foo>: 0: 55 push % rbp 1: 48 89 e5 mov % rsp ,% rbp 4 : 89 7d ec mov %edi,-0x14(% rbp ) 7 : 48 89 75 e0 mov %rsi,-0x20(% rbp ) b : c7 45 fc 0a 00 00 00 movl $0xa,-0x4(% rbp ) 12 : 8b 4d fc mov -0x4(% rbp ),% ecx 15 : 89 4d f8 mov %ecx,-0x8(% rbp ) 18: b8 00 00 00 00 mov $0x0,%eax 1d: c9 leaveq 1e: c3 retq int foo( int arg1, char * arg2) { int a=10, b; asm (" movl %1, %% ecx ;

22.Input/output operands Sometimes input and output operands are the same variable Transform input variable in some way 0000000000000107 <foo>: 0: 55 push % rbp 1: 48 89 e5 mov % rsp ,% rbp 4 : 89 7d ec mov %edi,-0x14(% rbp ) 7 : 48 89 75 e0 mov %rsi,-0x20(% rbp ) b : c7 45 fc 0a 00 00 00 movl $0xa,- 0x8(% rbp ) 12 : c7 45 fc 05 00 00 00 movl $0x5,-0x4(% rbp ) 19 : 8b 45 fc mov -0x4(% rbp ),% eax 1c : 03 45 f8 add -0x8(% rbp ),% eax 1f : 89 45 fc mov %eax,-0x4(% rbp ) 22: b8 00 00 00 00 mov $0x0,%eax 27: c9 leaveq 28: c3 retq int foo( int arg1, char * arg2) { int a=10, b=5; asm (“ addl %1 , %0;\ n " : "=r"(b) : "m"(a), "0"(b ) : ); return 0; }

23.Input/output operands (2) Input/output operands can also be specified with “+” 0000000000000107 <foo>: 0: 55 push % rbp 1: 48 89 e5 mov % rsp ,% rbp 4 : 89 7d ec mov %edi,-0x14(% rbp ) 7 : 48 89 75 e0 mov %rsi,-0x20(% rbp ) b : c7 45 fc 0a 00 00 00 movl $0xa,- 0x8(% rbp ) 12 : c7 45 fc 05 00 00 00 movl $0x5,-0x4(% rbp ) 19 : 8b 45 fc mov -0x4(% rbp ),% eax 1c : 03 45 f8 add -0x8(% rbp ),% eax 1f : 89 45 fc mov %eax,-0x4(% rbp ) 22: b8 00 00 00 00 mov $0x0,%eax 27: c9 leaveq 28: c3 retq int foo( int arg1, char * arg2) { int a=10, b=5; asm (“ addl %1 , %0;\ n " : “+r "(b) : "m"( a) : ); return 0; }

24.Clobbered list We cheated earlier… How does compiler know to save/restore ECX? It doesn’t int foo( int arg1, char * arg2) { int a=10, b; asm (" movl %1, %% ecx ;

25.Why clobber list? Why do we need this? Compilers try to optimize performance Cache intermediate values and assume values don’t change Compiler cannot inspect ASM behavior outside scope of compiler Clobber lists tell compiler: “You cannot trust the contents of these resources after this point” Or “Do not perform optimizations that span this block on these resources”

26.Using clobber lists ECX is used implicitly so its value must be saved/restored What about “memory”? int foo( int arg1, char * arg2) { int a=10, b; asm (" movl %1, %% ecx ;

27.Back to s ystem calls Function calls not that special Just an abstraction built on top of hardware System calls are basically function calls With a few minor changes Privilege elevation Constrained entry points Functions can call to any address System calls must go through “gates”

28.Implementing system calls System calls are implemented as a single function call: syscall () r ead() and write() actually just invoke syscall () What does syscall do? Enters into the kernel at a known location Elevates privilege Instantiates kernel level environment Once inside the kernel, an appropriate system call handler is invoked based on arguments to syscall ()

29.x 86 and Linux Number of different mechanisms for implementing syscall Legacy: int 0x80 – Invokes a single interrupt handler 32 bit: SYSENTER – Special instruction that sets up preset kernel environment 64 bit: SYSCALL – 64 bit version of SYSENTER All jump to a preconfigured execution environment inside kernel space Either interrupt context or OS defined context What about arguments? syscall ( int syscall_num , args …)