mirror of
https://github.com/CoderSherlock/CoderSherlock.github.io.git
synced 2026-06-13 08:08:10 -07:00
371 lines
14 KiB
Markdown
371 lines
14 KiB
Markdown
---
|
||
title: "Labs of CS350"
|
||
date: 2022-02-22 17:08:17 -0400
|
||
tags: ["Xv6", "Teaching", "Operating system", "Binghamton university"]
|
||
author: Pengzhan Hao
|
||
cover: '/static/2024-04/bu.png'
|
||
---
|
||
|
||
This will be a series regarding lab I gave during the spring 2022 semester.
|
||
|
||
The reason why I am writing this down is because it has been a week and no students ask for the solution of the last Lab.
|
||
I realise that learning gap between students are huge, especially when a non-profit university is admitting more and more students.
|
||
To help all students in understanding concepts of modern OS, I decided to write this post.
|
||
|
||
It starts with the past lab content I have (as the skelton), and will be amended with extra materials I think it helps.
|
||
Remember, it's for helping in learning. DON'T COPY & PASTE CODE!
|
||
|
||
## Index
|
||
[Lab1: Introduction of Makefile and Xv6.](#lab1-introduction)
|
||
[Lab3: System calls for process management.](#lab3-process)
|
||
[Lab4: Inter-processes communication.](#lab4-ipc)
|
||
[Lab6/7: CPU scheduling.](#lab6-7-scheduling)
|
||
|
||
## Lab1-Introduction
|
||
|
||
## Lab3-Process
|
||
|
||
## Lab4-IPC
|
||
|
||
## Lab6-7-Scheduling
|
||
|
||
### First user process in xv6
|
||
#### Kernel works
|
||
In xv6, as the same as conventional linux OS, the very first user-level process is **init**.
|
||
Before **init**'s running, all the OS bootstraps happen in a highly privileged mode(kernel level).
|
||
|
||
Xv6's kernel has the entry point as the main function located in the file *main.c*.
|
||
The main function invokes 17 functions to set up kernel page tables, interrupt handlers, I/O devices and etc.
|
||
When all kernel preparations are done, by calling the function ***userinit()***, kernel will boot up process init.
|
||
~~~c
|
||
int
|
||
main(void)
|
||
{
|
||
kinit1(end, P2V(4*1024*1024)); // phys page allocator
|
||
kvmalloc(); // kernel page table
|
||
mpinit(); // collect info about this machine
|
||
lapicinit();
|
||
seginit(); // set up segments
|
||
cprintf("\ncpu%d: starting xv6\n\n", cpu->id);
|
||
picinit(); // interrupt controller
|
||
ioapicinit(); // another interrupt controller
|
||
consoleinit(); // I/O devices & their interrupts
|
||
uartinit(); // serial port
|
||
pinit(); // process table
|
||
tvinit(); // trap vectors
|
||
binit(); // buffer cache
|
||
fileinit(); // file table
|
||
ideinit(); // disk
|
||
if(!ismp)
|
||
timerinit(); // uniprocessor timer
|
||
startothers(); // start other processors
|
||
kinit2(P2V(4*1024*1024), P2V(PHYSTOP)); // must come after startothers()
|
||
userinit(); // first user process
|
||
// Finish setting up this processor in mpmain.
|
||
mpmain();
|
||
}
|
||
~~~
|
||
It's tricky since that **init** is a user process, but kernel can't call any user-level system calls to create it.
|
||
Why? 1. Kernel has all privileges to create a user process. So it doesn't need to call system calls such as ***fork()***.
|
||
And 2. All other user processes can be created by forking from its parent.
|
||
Forking including clone the whole user virtual memory layout. However, the first process has no parent to fork from.
|
||
That's why it makes the creation of the first user process becomes so unique.
|
||
|
||
In *proc.c*, ***userinit()*** define there gives us the whole procedure of creating **init**.
|
||
Similar to the ***fork()***, but more simple.
|
||
Process control block(structures for storing the process status) was created at the very first by calling ***allocproc()***.
|
||
After then, by invoking ***setupkvm()***(defined in *vm.c*), kernel memory map was setup for the process.
|
||
During setting up kernel memory map, a page size virtual memory will be assigned to the process as ready.
|
||
And later, this page size memory will be used to store instructions of **init**.
|
||
|
||
Followed by setup kernel stack for the **init** process, calling ***inituvm()*** will load **init**'s text into the page that is just being allocated.
|
||
***inituvm()*** takes 3 arguments: a pointer to the process's page directory (p->pgdir),
|
||
a char-type pointer declared from external which point to **init**'s text segment(_binary_initcode_start), and
|
||
a char-type pointer which points to an external integer as the size of the **init**'s text segment(_binary_initcode_size).
|
||
Simply put, it will load instructions of **init** into the memory.
|
||
|
||
So now, the problem becomes when and where did instructions for **init** have compiled into the kernel?
|
||
~~~c
|
||
void
|
||
userinit(void)
|
||
{
|
||
struct proc *p;
|
||
extern char _binary_initcode_start[], _binary_initcode_size[];
|
||
|
||
p = allocproc();
|
||
initproc = p;
|
||
if((p->pgdir = setupkvm()) == 0)
|
||
panic("userinit: out of memory?");
|
||
inituvm(p->pgdir, _binary_initcode_start, (int)_binary_initcode_size);
|
||
p->sz = PGSIZE;
|
||
memset(p->tf, 0, sizeof(*p->tf));
|
||
p->tf->cs = (SEG_UCODE << 3) | DPL_USER;
|
||
p->tf->ds = (SEG_UDATA << 3) | DPL_USER;
|
||
p->tf->es = p->tf->ds;
|
||
p->tf->ss = p->tf->ds;
|
||
p->tf->eflags = FL_IF;
|
||
p->tf->esp = PGSIZE;
|
||
p->tf->eip = 0; // beginning of initcode.S
|
||
|
||
safestrcpy(p->name, "initcode", sizeof(p->name));
|
||
p->cwd = namei("/");
|
||
|
||
p->state = RUNNABLE;
|
||
}
|
||
~~~
|
||
#### Where the user-level code was integrated?
|
||
If you search the keyword "_binary_initcode_start" in the source code, you can't find any references.
|
||
The clue comes from the *Makefile*.
|
||
|
||
In the makefile, **initcode** is a prerequisites to compile the kernel image.
|
||
**Step 1**: Before kernel was compiled, *initcode.S* was first compiled to a runnable binary *initcode*.
|
||
This binary was very odd because it was not supposed to let any other OS to run it.
|
||
*Initcode.s* was first compiled without any standard including, and generating the intermediate file *initcode.o*.
|
||
|
||
**Step 2**: *Initcode.o* then linked to *Initcode.out* with two uncommon settings.
|
||
First it specify the entry of this binary file as when "start" symbol points to.
|
||
This "start" symbol was declared in the assembly code.
|
||
Second it specify a absolute address(0) for the text segments.
|
||
By doing this, text segments will be placed at the start of the binary file (except the header of the ELF)[^ldman].
|
||
|
||
**Step 3**: *Initcode.out* is already a minimized binary but it's not enough.
|
||
That's why when using **objcopy** to copy it to the file *initcode*, it further strip all headers and debug information[^objcopyman].
|
||
At this point, we have a minimal binary file *initcode*.
|
||
From the first byte of this file, it's only includes runnable instructions.
|
||
And the size of the file is only 44 bytes.
|
||
~~~bash
|
||
initcode: initcode.S
|
||
$(CC) $(CFLAGS) -nostdinc -I. -c initcode.S # Step 1
|
||
$(LD) $(LDFLAGS) -N -e start -Ttext 0 -o initcode.out initcode.o # Step 2
|
||
$(OBJCOPY) -S -O binary initcode.out initcode # Step 3
|
||
$(OBJDUMP) -S initcode.o > initcode.asm
|
||
~~~
|
||
|
||
This binary later were appended to the kernel using following commands.
|
||
And during this appending, 3 symbols were generated and added to the symbol table of the *kernel*[^ldman].
|
||
**"_binary_initcode_start"** contains the address of where the initcode segment was appended to.
|
||
**"_binary_initcode_end"** contains the address of where the initcode segment was ended at.
|
||
**"_binary_initcode_size"** is a \*ABS\* type symbol with value 0x2C(45) that specify the size of the initcode segment is 45 bytes.
|
||
~~~bash
|
||
kernel: $(OBJS) entry.o entryother initcode kernel.ld
|
||
$(LD) $(LDFLAGS) -T kernel.ld -o kernel entry.o $(OBJS) -b binary initcode entryother # <- This Line
|
||
$(OBJDUMP) -S kernel > kernel.asm
|
||
$(OBJDUMP) -t kernel | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > kernel.sym
|
||
~~~
|
||
**In short summary**,
|
||
using objdump, we can verify that source code *initcode.S* has been compiled and loaded into the kernel.
|
||
Also the segment of initcode's instructions was located by the pointer "_binary_initcode_start".
|
||
That's explain when calling ***inituvm(p->pgdir, _binary_initcode_start, (int)_binary_initcode_size);***,
|
||
functionalities implemented in initcode.S will be loaded into the runtime of the first process within xv6.
|
||
~~~bash
|
||
# Header of the file kernel
|
||
kernel: file format elf32-i386
|
||
kernel
|
||
architecture: i386, flags 0x00000112:
|
||
EXEC_P, HAS_SYMS, D_PAGED
|
||
start address 0x0010000c
|
||
|
||
Program Header:
|
||
LOAD off 0x00001000 vaddr 0x80100000 paddr 0x00100000 align 2**12
|
||
filesz 0x00008c6a memsz 0x00008c6a flags r-x
|
||
...
|
||
Sections:
|
||
Idx Name Size VMA LMA File off Algn
|
||
0 .text 00008586 80100000 00100000 00001000 2**2
|
||
CONTENTS, ALLOC, LOAD, READONLY, CODE
|
||
...
|
||
SYMBOL TABLE:
|
||
...
|
||
8010b50c g .data 00000000 _binary_initcode_end
|
||
...
|
||
8010b4e0 g .data 00000000 _binary_initcode_start
|
||
...
|
||
0000002c g *ABS* 00000000 _binary_initcode_size
|
||
...
|
||
~~~
|
||
|
||
#### User-level code
|
||
|
||
Take a look of content in the *initcode.S*, you will find the code can explain itself well.
|
||
There are no other jobs but just calling system call **exec** to run a user-level binary **"init"**.
|
||
|
||
*Initcode.S*:
|
||
~~~bash
|
||
# Initial process execs /init.
|
||
|
||
#include "syscall.h"
|
||
#include "traps.h"
|
||
|
||
|
||
# exec(init, argv)
|
||
.globl start
|
||
start:
|
||
pushl $argv
|
||
pushl $init
|
||
pushl $0 // where caller pc would be
|
||
movl $SYS_exec, %eax
|
||
int $T_SYSCALL
|
||
|
||
# for(;;) exit();
|
||
exit:
|
||
movl $SYS_exit, %eax
|
||
int $T_SYSCALL
|
||
jmp exit
|
||
|
||
# char init[] = "/init\0";
|
||
init:
|
||
.string "/init\0"
|
||
|
||
# char *argv[] = { init, 0 };
|
||
.p2align 2
|
||
argv:
|
||
.long init
|
||
.long 0
|
||
~~~
|
||
|
||
The **"init"** mentioned above is not a pure user-level binary executable that compiled from the source code *init.c*.
|
||
Within *init.c*, a file named *console* will be created at the runtime for saving standard outputs and errors.
|
||
Then it will forked a child process(the second user process), and let it run program **"sh"**.
|
||
|
||
**"sh"** is the xv6's default shell, a user-level program that generated from source *sh.c*.
|
||
After the shell boots up, you can interactive with the xv6.
|
||
This's how first process (and second process) was started in the xv6.
|
||
|
||
*init.c*:
|
||
~~~c
|
||
// init: The initial user-level program
|
||
|
||
#include "types.h"
|
||
#include "stat.h"
|
||
#include "user.h"
|
||
#include "fcntl.h"
|
||
|
||
char *argv[] = { "sh", 0 };
|
||
|
||
int
|
||
main(void)
|
||
{
|
||
int pid, wpid;
|
||
|
||
if(open("console", O_RDWR) < 0){
|
||
mknod("console", 1, 1);
|
||
open("console", O_RDWR);
|
||
}
|
||
dup(0); // stdout
|
||
dup(0); // stderr
|
||
|
||
for(;;){
|
||
printf(1, "init: starting sh\n");
|
||
pid = fork();
|
||
if(pid < 0){
|
||
printf(1, "init: fork failed\n");
|
||
exit();
|
||
}
|
||
if(pid == 0){
|
||
exec("sh", argv);
|
||
printf(1, "init: exec sh failed\n");
|
||
exit();
|
||
}
|
||
while((wpid=wait()) >= 0 && wpid != pid)
|
||
printf(1, "zombie!\n");
|
||
}
|
||
}
|
||
~~~
|
||
|
||
[^ldman]: [ld\(1\) - Linux man page](https://linux.die.net/man/1/ld)
|
||
[^objcopyman]: [3 objcopy - binutils mannual](https://sourceware.org/binutils/docs/binutils/objcopy.html)
|
||
|
||
### Xv6's round robin schduler
|
||
|
||
The Scheduler is the core of an operating system.
|
||
With the scheduling of processes, the kernel can achieve near-real-time execution of multiple workloads.
|
||
The scheduling problem is also an active aspect of computer science research.
|
||
You can’t have one algorithm to fit all scenarios.
|
||
|
||
Xv6 by default has a round-robin scheduler.
|
||
It’s controlled using two-level for-loops, where the top-level for-loop is an endless loop that will keep the scheduler busy running.
|
||
The second-level nested for-loop will iterate a data structure named Ptable where all control information for processes is stored.
|
||
Information including pid, process name, etc. is stored in a structure called proc. Ptable is an array of processes.
|
||
Every runnable process in the Ptable will run strictly 1 time tick until the for-loop reached the last process in the Ptable.
|
||
Then it will loop back to the top-level for-loop for the next iteration of processes.
|
||
|
||
|
||
~~~c
|
||
// In file proc.c
|
||
struct {
|
||
struct spinlock lock;
|
||
struct proc proc[NPROC];
|
||
} ptable;
|
||
|
||
// In file proc.h
|
||
struct proc {
|
||
uint sz; // Size of process memory (bytes)
|
||
pde_t* pgdir; // Page table
|
||
char *kstack; // Bottom of kernel stack for this process
|
||
enum procstate state; // Process state
|
||
int pid; // Process ID
|
||
struct proc *parent; // Parent process
|
||
struct trapframe *tf; // Trap frame for current syscall
|
||
struct context *context; // swtch() here to run process
|
||
void *chan; // If non-zero, sleeping on chan
|
||
int killed; // If non-zero, have been killed
|
||
struct file *ofile[NOFILE]; // Open files
|
||
struct inode *cwd; // Current directory
|
||
char name[16]; // Process name (debugging)
|
||
};
|
||
~~~
|
||
|
||
~~~c
|
||
// In file proc.c
|
||
void
|
||
scheduler(void)
|
||
{
|
||
struct proc *p;
|
||
|
||
for(;;){
|
||
// Enable interrupts on this processor.
|
||
sti();
|
||
|
||
// Loop over process table looking for process to run.
|
||
acquire(&ptable.lock);
|
||
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
|
||
if(p->state != RUNNABLE)
|
||
continue;
|
||
|
||
// Switch to chosen process. It is the process's job
|
||
// to release ptable.lock and then reacquire it
|
||
// before jumping back to us.
|
||
proc = p;
|
||
switchuvm(p);
|
||
p->state = RUNNING;
|
||
swtch(&cpu->scheduler, proc->context);
|
||
switchkvm();
|
||
|
||
// Process is done running for now.
|
||
// It should have changed its p->state before coming back.
|
||
proc = 0;
|
||
}
|
||
release(&ptable.lock);
|
||
|
||
}
|
||
}
|
||
~~~
|
||
|
||
It’s not hard to understand why this logic makes a round-robin manner.
|
||
This is very important to understand how to pick a process to run because scheduling is about always picking the appropriate process to achieve higher performance.
|
||
|
||
You can always come up with some new ideas for designing a good scheduler policy.
|
||
Understanding how to switch from one process to another is equivalently important.
|
||
|
||
Once the process for the next time tick is selected.
|
||
It’s time to switch from the running scheduler to the selected process. Wait for a second, there are two questions we haven’t answered.
|
||
1. What is the running scheduler?
|
||
2. How did the last running process stop running and give the CPU back to the scheduler?
|
||
|
||
|
||
|
||
|
||
### Lab
|
||
|
||
|