根據助教的指示,測試執行是否正常
./nachos -e ../test/test2 -e ../test/test1
執行結果如下
Ticks: total 200, idle 32, system 40, user 128
Disk I/O: reads 0, writes 0
Console I/O: reads 0, writes 0
Paging: faults 0
Network I/O: packets received 0, sent 0
user@user-VirtualBox:~/Documents/nachos/nachos-4.0/code/userprog$ ./nachos -e ../test/test2 -e ../test/test1
Total threads number is 2
Thread ../test/test2 is executing.
Thread ../test/test1 is executing.
Print integer:20
Print integer:21
Print integer:22
Print integer:9
Print integer:8
Print integer:7
Print integer:6
Print integer:23
Print integer:22
Print integer:21
Print integer:20
Print integer:19
return value:0
Print integer:18
Print integer:17
Print integer:16
Print integer:15
Print integer:14
Print integer:13
Print integer:12
Print integer:11
Print integer:10
Print integer:9
Print integer:8
Print integer:7
Print integer:6
return value:0
No threads ready or runnable, and no pending interrupts.
Assuming the program completed.
Machine halting!
Ticks: total 600, idle 23, system 100, user 477
Disk I/O: reads 0, writes 0
Console I/O: reads 0, writes 0
Paging: faults 0
Network I/O: packets received 0, sent 0
仔細觀察結果後發現一個大問題,原本test2的輸出是遞增,但在載入test1之後,test2的輸出卻變成了遞減。於是開始檢查test1與test2的程式內容。
test1.c :
#include "syscall.h"
main()
{
int n;
for (n=9;n>5;n--)
PrintInt(n);
}
test2.c :
#include "syscall.h"
main()
{
int n;
for (n=20;n<=25;n++)
PrintInt(n);
}
看完程式碼後發現test2改變的行為不只是遞減,連停止條件都不一樣了。但很明顯的,test2的行為恰好變成了test1的行為(遞減並且只有在比5大的時候才執行),感覺像是被test1覆蓋過去一樣,所以可能是context switching的時候出了問題導致這樣的結果,但也不排除有其他原因造成此現象。
如果是context switching的問題,那可能的原因是在context switching的時候,test2在記憶體的部分狀態(可能是PCB也可能是儲存指令的記憶體內容)被test1覆蓋掉了。所以首先檢查userprog底下的最像是記憶體管理的檔案。
addrspace .cc :
bool
AddrSpace::Load(char *fileName)
{
OpenFile *executable = kernel->fileSystem->Open(fileName);
NoffHeader noffH;
unsigned int size;
if (executable == NULL) {
cerr << "Unable to open file " << fileName << "\n";
return FALSE;
}
executable->ReadAt((char *)&noffH, sizeof(noffH), 0);
if ((noffH.noffMagic != NOFFMAGIC) &&
(WordToHost(noffH.noffMagic) == NOFFMAGIC))
SwapHeader(&noffH);
ASSERT(noffH.noffMagic == NOFFMAGIC);
// how big is address space?
size = noffH.code.size + noffH.initData.size + noffH.uninitData.size
+ UserStackSize; // we need to increase the size
// to leave room for the stack
numPages = divRoundUp(size, PageSize);
// cout << "number of pages of " << fileName<< " is "<<numPages<<endl;
size = numPages * PageSize;
ASSERT(numPages <= NumPhysPages); // check we're not trying
// to run anything too big --
// at least until we have
// virtual memory
DEBUG(dbgAddr, "Initializing address space: " << numPages << ", " << size);
// then, copy in the code and data segments into memory
if (noffH.code.size > 0) {
DEBUG(dbgAddr, "Initializing code segment.");
DEBUG(dbgAddr, noffH.code.virtualAddr << ", " << noffH.code.size);
executable->ReadAt(
&(kernel->machine->mainMemory[noffH.code.virtualAddr]),
noffH.code.size, noffH.code.inFileAddr);
}
if (noffH.initData.size > 0) {
DEBUG(dbgAddr, "Initializing data segment.");
DEBUG(dbgAddr, noffH.initData.virtualAddr << ", " << noffH.initData.size);
executable->ReadAt(
&(kernel->machine->mainMemory[noffH.initData.virtualAddr]),
noffH.initData.size, noffH.initData.inFileAddr);
}
delete executable; // close file
return TRUE; // success
}
若真的是context switching的時候發生錯誤,那錯誤應該會發生在addrspace.cc
的130行以及137行。假如一起執行test1, test2時,noffH.code.virtualAddr
的值都一樣的話,就代表context switching導致了上述之錯誤。因此我想要使用內建debug功能追蹤,而程式很剛好地在128行就有virtual address的debug,所以只要在error message找到它就好。以下擷取debug message中含有Initializing
的部分。
Initializing address space: 11, 1408
Initializing code segment.
0, 336
Initializing stack pointer: 1392
...
Initializing address space: 11, 1408
Initializing code segment.
0, 336
Initializing stack pointer: 1392
...
可以看到的是兩次code segment的virtual address是相同的,在這個addrspace.cc
的寫法之下,兩次的physical address也會是相同的,也就是說test1的指令真的覆蓋掉了test2的指令。不過這裡我就產生了一個疑問,既然記憶體的內容被覆蓋了,而且也沒有看到Initializing data segment
,那為什麼test2中n的內容沒有被改變呢?
因為兩次的virtual address都是相同,所以不能再繼續將virtual address直接對應到physical address。我們必須對每一個程式維護一個page table,使得兩個程式所使用的physical address不會重複。為此我們必須要有一個維護physical address的功能。為此增加phyPageStat
用於管理physical page的使用情形,以及numFreePages
方便記錄剩餘可用的physical page數目。
addrspace.h:
class AddrSapce{
...
static bool phyPageStat[NumPhysPages]; // Managing usage of physical pages
static unsigned int numFreePages; // Recording the physical pages can be used
...
}
為了增加程式的可讀性,定義unused跟occupied,兩者用於表示physical page的使用情形。
addrspace.h:
#define UNUSED false
#define OCCU true
於程式的一開始初始化physical page相關的變數。
addrspace .cc :
bool AddrSpace::phyPageStat[NumPhysPages] = {UNUSED};
unsigned int AddrSpace::numFreePages = NumPhysPages;
因為新增加了維護physical page的工作,所以在destruction的時候,記得要把physical page設定成unused,並一同維護可用page數目。
addrspace .cc :
AddrSpace::~AddrSpace()
{
unsigned int i = 0;
for(i = 0; i < numPages; i += 1){
AddrSpace::phyPageStat[pageTable[i].physicalPage] = UNUSED;
AddrSpace::numFreePages += 1;
}
delete pageTable;
}
我在page table維護的策略就是從頭開始找physical address,有能用的就先佔下來,不能用的就跳過。
addrspace .cc :
AddrSpace::Load(char *fileName)
{
...
// find the unused physical memory to used
unsigned int i = 0, phyPageId = 0;
for(; i < numPages; i += 1){
while((phyPageId < NumPhysPages) && (AddrSpace::phyPageStat[phyPageId] == OCCU)) phyPageId += 1;
AddrSpace::phyPageStat[phyPageId] = OCCU;
AddrSpace::numFreePages--;
bzero(&(kernel->machine->mainMemory[phyPageId * PageSize]), PageSize);
pageTable[i].physicalPage = phyPageId;
pageTable[i].use = false;
pageTable[i].valid = true;
}
...
}
接下來在把code讀進memory的時候必須要把這邊的virtual address轉回physical address,這就是利用剛剛建立的page table的時候了。我們首先使用noffH.code.virtualAddr / PageSize
求出這段程式在virtual memory中是位於第幾個page,接著丟進pageTable
中找出它在physical memory中的page index,最後不要忘記要把offset(noffH.code.virtualAddr % PageSize
)加回來。
而對於data的部分也是同理。
addrspace .cc :
AddrSpace::Load(char *fileName)
{
...
// then, copy in the code and data segments into memory
if (noffH.code.size > 0) {
DEBUG(dbgAddr, "Initializing code segment.");
DEBUG(dbgAddr, noffH.code.virtualAddr << ", " << noffH.code.size);
executable->ReadAt(
&(kernel->machine->mainMemory[
pageTable[noffH.code.virtualAddr / PageSize].physicalPage * PageSize + (noffH.code.virtualAddr % PageSize)
]),
noffH.code.size, noffH.code.inFileAddr);
}
if (noffH.initData.size > 0) {
DEBUG(dbgAddr, "Initializing data segment.");
DEBUG(dbgAddr, noffH.initData.virtualAddr << ", " << noffH.initData.size);
executable->ReadAt(
&(kernel->machine->mainMemory[
pageTable[noffH.initData.virtualAddr / PageSize].physicalPage * PageSize + (noffH.initData.virtualAddr % PageSize)
]),
noffH.initData.size, noffH.initData.inFileAddr);
}
...
}
一樣輸入一開始所使用的指令
./nachos -e ../test/test2 -e ../test/test1
執行結果如下
Total threads number is 2
Thread ./test/test2 is executing.
Thread ./test/test1 is executing.
Print integer:20
Print integer:21
Print integer:22
Print integer:9
Print integer:8
Print integer:7
Print integer:6
Print integer:23
Print integer:24
Print integer:25
return value:0
return value:0
No threads ready or runnable, and no pending interrupts.
Assuming the program completed.
Machine halting!
Ticks: total 300, idle 8, system 70, user 222
Disk I/O: reads 0, writes 0
Console I/O: reads 0, writes 0
Paging: faults 0
Network I/O: packets received 0, sent 0
已經恢復正常運作,看來只有contex switching的時候有問題,順利解決。
test1雖然覆蓋了test2的指令,而且也沒有看到Initializing data segment
(即initData.size
<= 0),那為什麼沒有連test2中n的內容都改變呢?換言之,test2的n在context switching的時候是儲存在甚麼地方,為甚麼沒有被改到?
在我看來n是屬於data的範疇,而不是code。但是從debug message看起來initData.size
<= 0。所以首先檢查initData.size
和uninitData.size
。於Load
函式中加上DEBUG觀察這兩個變數。
addrspace .cc :
AddrSpace::Load(char *fileName)
{
...
DEBUG(dbgTu, "initDataSize = " << noffH.initData.size << ", uninitDataSize = " << noffH.uninitData.size);
...
}
為此我也寫了一支程式來看看上述兩個data size究竟是甚麼代表意思。
test3.c :
#include <syscall.h>
int main()
{
int a = 0, b = 0, i = 0;
for(i = 0; i < 10; i += 1) a = i;
b = a + b;
return 0;
}
debug message:
...
initDataSize = 0, uninitDataSize = 0
...
代表這些變數根本就不是放在data那裡,緊接著的疑問是,那究竟甚麼樣的東西會放在data那邊呢?其實仔細想想main
本身就是一個函式,所以是不是nachos把main
中的所有東西都放到了code segment,而所謂的data放的是全域變數呢?
test3.c :
#include <syscall.h>
char tt3 = 'a';
short tt4;
int main()
{
int a = 0, b = 0, i = 0;
for(i = 0; i < 10; i += 1) a = i;
b = a + b;
return 0;
}
debug message:
...
initDataSize = 16, uninitDataSize = 32
...
這樣就印證了對於全域變數和data之間的關係的猜測是正確的,但是這個結果非但沒有解決最初的問題,反而還讓問題變得複雜。根據nachos的程式寫法,他把資料區分為data和code兩者,既然test2中的n不是放在data之中,那就一定是放在code之中了。既然如此,在context switching發生錯誤時,n的值不是更應該被更動嗎?
重新執行一次test1 test2
./nachos -e ../test/test2 -e ../test/test1
debug message:
Switching from: main to: test/test2
...
Switching from: test/test2 to: test/test1
...
Switching from: test/test1 to: test/test2
...
透過此debug message發現context switching程式碼的關鍵字有switch,所以想到可以利用這個關鍵字找到程式碼的位置。
grep -r "Switching" *
透過grep
找到此訊息出處為./code/threads/scheduler.cc
之中,而其中又有幾行耐人尋味的程式碼。
scheduler. cc :
void
Scheduler::Run (Thread *nextThread, bool finishing)
{
...
if (oldThread->space != NULL) { // if this thread is a user program,
oldThread->SaveUserState(); // save the user's CPU registers
oldThread->space->SaveState();
}
...
}
thread. cc :
void
Thread::SaveUserState()
{
for (int i = 0; i < NumTotalRegs; i++)
userRegisters[i] = kernel->machine->ReadRegister(i);
}
thread.h :
class Thread {
...
int userRegisters[NumTotalRegs]; // user-level CPU register state
...
}
這下終於找到了n到底存在哪裡,scheduler在變更thread的時候會先把舊的User Sate存到位於thread
的userRegisters
之中,執行新的thread時,會把User State從該thread
的userRegisters
中拿出來。而這幾個userRegisters
就是test1, test2各自的n所待的地方。
為了追蹤noffH.code.virtualAddr
,先從./code/threads/main.cc
中把debugArg
的值改成"a"
(代表開啟dbgAddr flag),這樣就可以在error output中看到debug message。另外也發現debug flag都放在./code/lib/debug.h
之中,而檔案最上方的註解也寫明鼓勵加入自己的flag。
透過此文章得知在指令中添加-d參數即可指定debug flag,這個方法明顯比我一開始嘗試的方法好很多。
例如:
./nachos -d a -e ../test/test2 -e ../test/test1
尋找檔案名時,利用find
指令(regular expression)。
例如:
user@user-VirtualBox:~/Documents/nachos/nachos-4.0/code$ find . -name "kernel.*"
./userprog/kernel.o
./threads/kernel.cc
./threads/kernel.o
./threads/kernel.h
./filesys/kernel.o
./network/kernel.o
尋找method在哪裡被用過,或尋找變數或function在哪裡被定義,可以利用grep
指令。
例如:
user@user-VirtualBox:~/Documents/nachos/nachos-4.0/code$ grep -r "Interrupt"
Binary file userprog/mipssim.o matches
Binary file userprog/kernel.o matches
userprog/synchconsole.cc:// Interrupt handler called when keystroke is hit; wake up
userprog/synchconsole.cc:// Interrupt handler called when it's safe to send the next
Binary file userprog/elevator.o matches
Binary file userprog/disk.o matches
userprog/exception.cc:// Interrupts (which can also cause control to transfer from user
Binary file userprog/machine.o matches
Binary file userprog/timer.o matches
Binary file userprog/userkernel.o matches
Binary file userprog/console.o matches
Binary file userprog/nachos matches
Binary file userprog/alarm.o matches
Binary file userprog/scheduler.o matches
Binary file userprog/thread.o matches
Binary file userprog/addrspace.o matches
Binary file userprog/synch.o matches
Binary file userprog/main.o matches
Binary file userprog/interrupt.o matches
Binary file userprog/exception.o matches
machine/timer.cc: SetInterrupt();
...
可能會跑出一堆亂七八糟的東西,此時利用指令pipeline的技巧,試著過濾出想要的東西。
user@user-VirtualBox:~/Documents/nachos/nachos-4.0/code$ grep -r "Interrupt" | grep "cc"
userprog/synchconsole.cc:// Interrupt handler called when keystroke is hit; wake up
userprog/synchconsole.cc:// Interrupt handler called when it's safe to send the next
userprog/exception.cc:// Interrupts (which can also cause control to transfer from user
machine/timer.cc: SetInterrupt();
machine/timer.cc: SetInterrupt(); // do last, to let software interrupt handler
machine/timer.cc:// Timer::SetInterrupt
machine/timer.cc:Timer::SetInterrupt()
machine/interrupt.cc:// PendingInterrupt::PendingInterrupt
machine/interrupt.cc:PendingInterrupt::PendingInterrupt(CallBackObj *callOnInt,
machine/interrupt.cc: callOnInterrupt = callOnInt;
machine/interrupt.cc:PendingCompare (PendingInterrupt *x, PendingInterrupt *y)
machine/interrupt.cc:// Interrupt::Interrupt
machine/interrupt.cc:// Interrupts start disabled, with no interrupts pending, etc.
machine/interrupt.cc:Interrupt::Interrupt()
machine/interrupt.cc: pending = new SortedList<PendingInterrupt *>(PendingCompare);
machine/interrupt.cc:// Interrupt::~Interrupt
machine/interrupt.cc:Interrupt::~Interrupt()
machine/interrupt.cc:// Interrupt::ChangeLevel
machine/interrupt.cc:Interrupt::ChangeLevel(IntStatus old, IntStatus now)
machine/interrupt.cc:// Interrupt::SetLevel
machine/interrupt.cc:Interrupt::SetLevel(IntStatus now)
...
Nachos Beginner's Guide
滑鐵盧大學 ppt
偉大的學長姐
也是偉大的學長姐
nachos debug教學
nachos document
助教的作業指示