天天看點

[MIT6.828] LAB4 Part B: Copy-on-Write Fork

Exercise 3. Implement the sys_env_set_pgfault_upcall system call. Be sure to enable permission checking when looking up the environment ID of the target environment, since this is a "dangerous" system call.

修改kern/syscall.c檔案,添加注冊頁故障處理函數的功能:

1、修改sys_env_set_pgfault_upcall()函數

在代碼中有兩行被注釋掉了,剛開始的時候我是用這兩行代碼檢測傳入的func是否合法,但是評分程式過不了才知道這裡不要求驗證func的有效性。不過我還是覺得在這裡驗證最好,畢竟往核心傳入指針了阿。

// Set the page fault upcall for 'envid' by modifying the corresponding struct

// Env's 'env_pgfault_upcall' field. When 'envid' causes a page fault, the

// kernel will push a fault record onto the exception stack, then branch to

// 'func'.

//

// Returns 0 on success, < 0 on error. Errors are:

// -E_BAD_ENV if environment envid doesn't currently exist,

// or the caller doesn't have permission to change envid.

static int

sys_env_set_pgfault_upcall(envid_t envid, void *func)

{

struct Env *e;

if (envid2env(envid, &e, 1) <0)

return -E_BAD_ENV;

//if (user_mem_check(curenv, func, sizeof(void *), PTE_U|PTE_P) <0)

// return -E_FAULT;

e->env_pgfault_upcall = func;

return 0;

}

2、修改syscal函數,把sys_env_set_pgfault_upcall()注冊給适當的調用号。

Exercise 4. Implement the code in page_fault_handler in kern/trap.c required to dispatch page faults to the user-mode handler. Be sure to take appropriate precautions when writing into the exception stack. (What happens if the user environment runs out of space on the exception stack?)

修改kern/trap.c,在異常進行中加入調用使用者注冊的頁錯誤處理函數的功能

注意檢查使用者頁錯誤處理函數是否注冊,使用者異常棧是否配置設定,是否溢出,是否頁錯誤嵌套。并注意給新的UTrapframe和嵌套傳回值保留白間。

//page fault handler

void pf_hdl(struct Trapframe *tf)

{

uint32_t fault_va;

// Read processor's CR2 register to find the faulting address

fault_va = rcr2();

// Handle kernel-mode page faults.

// LAB 3: Your code here.

if (tf->tf_cs == GD_KT)

{

print_trapframe(tf);

panic("Page fault in kernel");

}

// We've already handled kernel-mode exceptions, so if we get here,

// the page fault happened in user mode.

// Call the environment's page fault upcall, if one exists. Set up a

// page fault stack frame on the user exception stack (below

// UXSTACKTOP), then branch to curenv->env_pgfault_upcall.

//

// The page fault upcall might cause another page fault, in which case

// we branch to the page fault upcall recursively, pushing another

// page fault stack frame on top of the user exception stack.

//

// The trap handler needs one word of scratch space at the top of the

// trap-time stack in order to return. In the non-recursive case, we

// don't have to worry about this because the top of the regular user

// stack is free. In the recursive case, this means we have to leave

// an extra word between the current top of the exception stack and

// the new stack frame because the exception stack _is_ the trap-time

// stack.

//

// If there's no page fault upcall, the environment didn't allocate a

// page for its exception stack or can't write to it, or the exception

// stack overflows, then destroy the environment that caused the fault.

// Note that the grade script assumes you will first check for the page

// fault upcall and print the "user fault va" message below if there is

// none. The remaining three checks can be combined into a single test.

//check page fault user handler && ux stack alloced && ux stack overflow

if ( (curenv->env_pgfault_upcall != NULL) &&

( (tf->tf_esp >=UXSTACKEND+sizeof(struct UTrapframe)+sizeof(int)) || (tf->tf_esp< USTACKTOP)))

{

user_mem_assert(curenv, curenv->env_pgfault_upcall, 1, PTE_U|PTE_P) ;

user_mem_assert(curenv, (void *)UXSTACKEND, PGSIZE, PTE_U|PTE_W|PTE_P) ;

struct UTrapframe *utf;

//is a trap from ux stack ?

if(tf->tf_esp< UXSTACKTOP && tf->tf_esp>=UXSTACKEND)

{//yes

utf = (struct UTrapframe *)(tf->tf_esp - sizeof(struct UTrapframe) - sizeof(int));

}

else

{

utf = (struct UTrapframe *)(UXSTACKTOP - sizeof(struct UTrapframe));

}

//prepare UTrapframe

utf->utf_fault_va = fault_va;

utf->utf_err = tf->tf_err;

utf->utf_regs = tf->tf_regs;

utf->utf_eip = tf->tf_eip;

utf->utf_eflags = tf->tf_eflags;

utf->utf_esp = tf->tf_esp;

//set user pf handler entry

tf->tf_eip = (uint32_t)curenv->env_pgfault_upcall;

tf->tf_esp = (uint32_t)utf;

// run user pf handler;

env_run(curenv);

}

// Destroy the environment that caused the fault.

cprintf("[%08x] user fault va %08x ip %08x/n",

curenv->env_id, fault_va, tf->tf_eip);

print_trapframe(tf);

//monitor(tf);

env_destroy(curenv);

}

Exercise 5. Implement the _pgfault_upcall routine in lib/pfentry.S. The interesting part is returning to the original point in the user code that caused the page fault. You'll return directly there, without going back through the kernel. The hard part is simultaneously switching stacks and re-loading the EIP.

修改lib/pfentry.S檔案,在使用者層加入使用者頁錯誤處理函數的入口功能。

step 0是調用使用者注冊的使用者頁錯誤處理函數。step 1 是用來設定傳回到發生頁錯誤(原始)的位址。 step 2是用來彈出通用寄存器。 step 3 是用來彈出狀态寄存器。 step 4用來切換esp到原始棧。 step 5就是傳回到原始位址。

各種注意事項代碼的注釋中已有說明,一定得認真看了再去寫代碼。

// Page fault upcall entrypoint.

// This is where we ask the kernel to redirect us to whenever we cause

// a page fault in user space (see the call to sys_set_pgfault_handler

// in pgfault.c).

//

// When a page fault actually occurs, the kernel switches our ESP to

// point to the user exception stack if we're not already on the user

// exception stack, and then it pushes a UTrapframe onto our user

// exception stack:

//

// trap-time esp

// trap-time eflags

// trap-time eip

// utf_regs.reg_eax

// ...

// utf_regs.reg_esi

// utf_regs.reg_edi

// utf_err (error code)

// utf_fault_va <-- %esp

//

// If this is a recursive fault, the kernel will reserve for us a

// blank word above the trap-time esp for scratch work when we unwind

// the recursive call.

//

// We then have call up to the appropriate page fault handler in C

// code, pointed to by the global variable '_pgfault_handler'.

.text

.globl _pgfault_upcall

_pgfault_upcall:

// Call the C page fault handler.

// Step 0:

pushl %esp // function argument: pointer to UTF

movl _pgfault_handler, %eax

call *%eax

addl $4, %esp // pop function argument

// Now the C page fault handler has returned and you must return

// to the trap time state.

// Push trap-time %eip onto the trap-time stack.

//

// Explanation:

// We must prepare the trap-time stack for our eventual return to

// re-execute the instruction that faulted.

// Unfortunately, we can't return directly from the exception stack:

// We can't call 'jmp', since that requires that we load the address

// into a register, and all registers must have their trap-time

// values after the return.

// We can't call 'ret' from the exception stack either, since if we

// did, %esp would have the wrong value.

// So instead, we push the trap-time %eip onto the *trap-time* stack!

// Below we'll switch to that stack and call 'ret', which will

// restore %eip to its pre-fault value.

//

// In the case of a recursive fault on the exception stack,

// note that the word we're pushing now will fit in the

// blank word that the kernel reserved for us.

//

// Throughout the remaining code, think carefully about what

// registers are available for intermediate calculations. You

// may find that you have to rearrange your code in non-obvious

// ways as registers become unavailable as scratch space.

// Step 1:

movl 0x30(%esp), %ebp

subl $0x4, %ebp

movl %ebp, 0x30(%esp)

movl 0x28(%esp), %eax

movl %eax, (%ebp)

// Restore the trap-time registers. After you do this, you

// can no longer modify any general-purpose registers.

// Step 2:

addl $0x8, %esp

popal

// Restore eflags from the stack. After you do this, you can

// no longer use arithmetic operations or anything else that

// modifies eflags.

// Step 3:

addl $0x4, %esp

popfl

// Switch back to the adjusted trap-time stack.

// Step 4:

popl %esp

// Return to re-execute the instruction that faulted.

// Step 5:

ret  

Exercise 6. Finish set_pgfault_handler() in lib/pgfault.c.

修改lib/pgfault.c檔案的set_pgfault_handler()函數

注意這裡饒了個彎子,沒有直接注冊handler而是注冊了練習5中的彙編函數_pgfault_upcall,這樣是因為要從handler直接傳回到使用者頁錯誤發生處,需要彙編函數_pgfault_upcall來制造棧環境(普通C代碼無能為力)。

/ Set the page fault handler function.

// If there isn't one yet, _pgfault_handler will be 0.

// The first time we register a handler, we need to

// allocate an exception stack (one page of memory with its top

// at UXSTACKTOP), and tell the kernel to call the assembly-language

// _pgfault_upcall routine when a page fault occurs.

//

void

set_pgfault_handler(void (*handler)(struct UTrapframe *utf))

{

int r;

if (_pgfault_handler == 0) {

// First time through!

if((r=sys_page_alloc(0, (void *)UXSTACKEND, PTE_U|PTE_W|PTE_P)) <0)

panic("sys_page_alloc: %e", r);

sys_env_set_pgfault_upcall(0, _pgfault_upcall);

}

// Save handler pointer for assembly to call.

_pgfault_handler = handler;

}

Challenge! Extend your kernel so that not only page faults, but all types of processor exceptions that code running in user space can generate, can be redirected to a user-mode exception handler. Write user-mode test programs to test user-mode handling of various exceptions such as divide-by-zero, general protection fault, and illegal opcode.

給系統異常加入注冊使用者異常處理函數的系統調用。這個最簡單的實作方法是隻寫一個系統調用,用參數中的異常号來分辨不同的異常,代碼可以精簡很多。如果覺得這樣寫使用者調用不友善(還得記異常号)那就給每個異常寫一個使用者層的lib接口。這樣還可以保持上面寫的set_pgfault_handler函數不動。 沒啥技術含量和挑戰性,咱就不寫了,以後有機會再說。

Exercise 7. Implement fork, duppage and pgfault in lib/fork.c.

Test your code with the forktree program. It should produce the following messages, with interspersed 'new env', 'free env', and 'exiting gracefully' messages. The messages may not appear in this order, and the environment IDs may be different.

修改lib/fork.c檔案

pgfault()函數:檢查頁屬性,然後配置設定頁,複制資料。

//

// Custom page fault handler - if faulting page is copy-on-write,

// map in our own private writable copy.

//

static void

pgfault(struct UTrapframe *utf)

{

void *addr = (void *) utf->utf_fault_va;

uint32_t err = utf->utf_err;

int r;

// Check that the faulting access was (1) a write, and (2) to a

// copy-on-write page. If not, panic.

// Hint:

// Use the read-only page table mappings at vpt

// (see <inc/memlayout.h>).

if (!(err&FEC_WR))

panic("Page fault: not a write access.");

if ( !(vpt[VPN(addr)]&PTE_COW) )

panic("Page fualt: not a COW page.");

// Allocate a new page, map it at a temporary location (PFTEMP),

// copy the data from the old page to the new page, then move the new

// page to the old page's address.

// Hint:

// You should make three system calls.

// No need to explicitly delete the old page's mapping.

if ((r=sys_page_alloc(0, PFTEMP, PTE_U|PTE_W|PTE_P)) <0)

panic("Page fault: sys_page_alloc err %e.", r);

memmove(PFTEMP, (void *)PTE_ADDR(addr), PGSIZE);

if ((r=sys_page_map(0, PFTEMP, 0, (void *)PTE_ADDR(addr), PTE_U|PTE_W|PTE_P))<0)

panic("Page fault: sys_page_map err %e.", r);

if ((r=sys_page_unmap(0, PFTEMP))<0)

panic("Page fault: sys_page_unmap err %e.", r);

}

duppage()函數:注意差別寫/寫時複制 和 隻讀頁面。在這裡想到,其實在kern/env.c的load_inode()函數裡面我們都是用可寫的方式處理頁面。是以需要注意改進阿。

//

// Map our virtual page pn (address pn*PGSIZE) into the target envid

// at the same virtual address. If the page is writable or copy-on-write,

// the new mapping must be created copy-on-write, and then our mapping must be

// marked copy-on-write as well. (Exercise: Why we need to mark ours

// copy-on-write again if it was already copy-on-write at the beginning of

// this function?)

//

// Returns: 0 on success, < 0 on error.

// It is also OK to panic on error.

//

static int

duppage(envid_t envid, unsigned pn)

{

int r;

void *addr = (void *)(pn<<PGSHIFT);

// LAB 4: Your code here.

if (vpt[pn]&(PTE_W|PTE_COW))

{

if ((r = sys_page_map(0, addr, envid, addr, PTE_P | PTE_U | PTE_COW))<0)

return r;

if ((r = sys_page_map(0, addr, 0, addr, PTE_P | PTE_U | PTE_COW))<0)

return r;

}

else

{

if ((r = sys_page_map(0, addr, envid, addr, PTE_P | PTE_U))<0)

return r;

}

return 0;

//panic("duppage not implemented");

}

fork()函數:這裡需要注意,由于建立程序時,自程序會把自己的pgfault_handler設定成空,是以要重新注冊一下。

不過不能在子程序注冊,因為運作子程序時候調用函數或者寫入變量會導緻頁錯誤,這時候還沒有注冊pgfault_handler,是以就預設程序銷毀了。 隻能在父程序設定子程序的句柄。我有個疑問,為什麼不在建立程序的時候直接把pgfault_handler繼承過來呢,不知道設計者怎麼想的。

// User-level fork with copy-on-write.

// Set up our page fault handler appropriately.

// Create a child.

// Copy our address space and page fault handler setup to the child.

// Then mark the child as runnable and return.

//

// Returns: child's envid to the parent, 0 to the child, < 0 on error.

// It is also OK to panic on error.

//

// Hint:

// Use vpd, vpt, and duppage.

// Remember to fix "env" in the child process.

// Neither user exception stack should ever be marked copy-on-write,

// so you must allocate a new page for the child's user exception stack.

//

envid_t

fork(void)

{

envid_t envid;

uint8_t *addr;

int r;

extern unsigned char end[];

set_pgfault_handler(pgfault);

envid = sys_exofork();

if (envid < 0)

panic("sys_exofork: %e", envid);

//child

if (envid == 0) {

//can't set pgh here ,must before child run

//because when child run ,it will make a page fault

env = &envs[ENVX(sys_getenvid())];

return 0;

}

//parent

for (addr = (uint8_t*) UTEXT; addr < end; addr += PGSIZE)

duppage(envid, VPN(addr));

duppage(envid, VPN(&addr));

//copy user exception stack

if ((r = sys_page_alloc(envid, (void *)UXSTACKEND, PTE_P|PTE_U|PTE_W)) < 0)

panic("sys_page_alloc: %e", r);

r = sys_env_set_pgfault_upcall(envid, env->env_pgfault_upcall);

//set child status

if ((r = sys_env_set_status(envid, ENV_RUNNABLE)) < 0)

panic("sys_env_set_status: %e", r);

return envid;

}  

Challenge! Implement a shared-memory fork() called sfork(). This version should have the parent and child share all their memory pages (so writes in one environment appear in the other) except for pages in the stack area, which should be treated in the usual copy-on-write manner. Modify user/forktree.c to use sfork() instead of regular fork(). Also, once you have finished implementing IPC in part C, use your sfork() to run user/pingpongs. You will have to find a new way to provide the functionality of the global env pointer.

修改lib/fork.c。把sfork()函數變成一個讀寫共享全局資料和代碼的fork()

和fork()函數基本一緻,隻需要注意共享UTEXT到end之間的資料時老老實實的按照原來的屬性重新映射一個到子程序就可以了,注意權限哦

// Challenge!

int

sfork(void)

{

envid_t envid;

uint8_t *addr;

int r;

extern unsigned char end[];

set_pgfault_handler(pgfault);

envid = sys_exofork();

if (envid < 0)

panic("sys_exofork: %e", envid);

//child

if (envid == 0) {

//can't set pgh here ,must before child run

//because when child run ,it will make a page fault

env = &envs[ENVX(sys_getenvid())];

return 0;

}

//parent

//share pages

for (addr = (uint8_t*) UTEXT; addr < end; addr += PGSIZE)

{

if ((r = sys_page_map(0, addr, envid, addr, PTE_USER&vpt[VPN(addr)]))<0)

return r;

}

//copy normal stack

duppage(envid, VPN(&addr));

//copy user exception stack

if ((r = sys_page_alloc(envid, (void *)UXSTACKEND, PTE_P|PTE_U|PTE_W)) < 0)

panic("sys_page_alloc: %e", r);

r = sys_env_set_pgfault_upcall(envid, env->env_pgfault_upcall);

//set child status

if ((r = sys_env_set_status(envid, ENV_RUNNABLE)) < 0)

panic("sys_env_set_status: %e", r);

return envid;

}

Challenge! Your implementation of fork makes a huge number of system calls. On the x86, switching into the kernel using interrupts has non-trivial cost. Augment the system call interface so that it is possible to send a batch of system calls at once. Then change fork to use this interface.

How much faster is your new fork?

You can answer this (roughly) by using analytical arguments to estimate how much of an improvement batching system calls will make to the performance of your fork: How expensive is an int 0x30 instruction? How many times do you execute int 0x30 in your fork? Is accessing the TSS stack switch also expensive? And so on...

Alternatively, you can boot your kernel on real hardware and really benchmark your code. See the RDTSC (read time-stamp counter) instruction, defined in the IA32 manual, which counts the number of clock cycles that have elapsed since the last processor reset. QEMU doesn't emulate this instruction faithfully (it can either count the number of virtual instructions executed or use the host TSC, neither of which reflects the number of cycles a real CPU would require).

這次的挑戰是要求測試下fork的速度,可以通過RDTSC指令阿或者指令計數器來計算。這次的fork調用了N多個系統調用,每個系統調用都要切換棧空間,壓入彈出N多寄存器,尤其是後來我還加入了浮點寄存器的保護,這都512個位元組呢。速度會奇慢無比,具體代碼不寫了。RDTSC指令格式滿大街都是,大家自己google吧。

繼續閱讀