Programming labs and resources at cmu 15-213, finished in my self-study.
- tags: computer architecture, computer system, C, assembly language
- study semester: 2015 Fall
- course homepage: 15-213 Introduction to Computer Systems
- Textbook: Computer Systems:A Programmer's Perspective (3rd edition)
- Course Videos
- Lab Assignments
- GDB:
- Norman Matloff's Unix and Linux Tutorial Center
How to operate integers and floating points in the bit-level?
howManyBits
int howManyBits(int x)
: return the minimum number of bits required to represent x in two's complement.- key: remove redundant sign bits at head, find the first appearance position of 1, binary search + conditional shift
Understand programs in the machine-level.
./bomb/strs
is my answer
- be familiar with gdb commands
- if-else, while loop
- switch, jump-table
- recursive function, shift
-
check the length of input string.
-
for loop, select chars from another string.
char *input = "your string";
char *str = 0x4024b0;
char *res = %rsp + 0x10;
for (int i = 0; i < 6; i++) {
int temp = input[i] & 0xf;
res[i] = str[temp];
}
- string match test
- input string to 6 numbers, which must be in [1, 6] and unique.
- number = 7 - number
- rearrange the linked list by the order of numbers:
int numbers[6]; // numbers = %rsp
struct node* list[6]; // list = %rsp + 0x20
for (int i = 0; i < 6; i++) {
%edx = head;
if (numbers[i] > 1) {
int temp = 1;
while (temp < numbers[i]) {
%edx = %edx->next;
temp++;
}
}
list[i] = %edx;
}
struct node *ptr = list[0];
for (int i = 1; i < 6; i++) {
ptr->next = list[i];
ptr = list[i];
}
list[5]->next = NULL;
- check the ordering of linked-list:
struct node *ptr = list[0];
for (int i = 5; i > 0; i--) {
int val = ptr->next->val;
if (ptr->val < val)
explode();
else
ptr = ptr->next;
}
Be careful about buffer overflow with rough memory writing.
input string = commands in bytes code + filled bytes + address of the commands
level | commands |
---|---|
2 | move cookie to %rdi , push &&touch2 , return |
3 | recover memory on stack (avoid being overwritten in hexmatch() ), move &str to %rdi , push &&touch3 , return |
The stack has uncertain address and is non-executable. Therefore, we are unable to overwrite the address of command string. However, the address of text/code is fixed. We can look into the machine code representation of functions, and extract valid parts of commands before return.
- level 2
- notice that in the machine code of
getval_280()
, we can retrieve the value from stack to%rax
.
00000000004019ca <getval_280>:
4019ca: b8 29 58 90 c3 mov $0xc3905829,%eax
4019cf: c3 ret
58: popq %rax
90: nop
c3: ret
- in the
setval_426()
function, we can transfer data to%rdi
, which is the parameter of touch2.
00000000004019c3 <setval_426>:
4019c3: c7 07 48 89 c7 90 movl $0x90c78948,(%rdi)
4019c9: c3 ret
48 89 c7: movq %rax, %rdi
90: nop
c3: ret
- build the string data from the stack architecture:
[Before gets]
return address of test()
buffer
<-- %rsp
[After gets]
touch2
addr_setval_426
cookie value
addr_getval_280 <-- original return address
...(any value to fill the buffer)
<-- %rsp
where addr_setval_426
and addr_getval_280
are the corresbonding addresses of gadgets.
- level 3
key problem: We cannot update %rsp by the given gadgets. In order to place the data upside the stack pointer, we need to calculate how many pop operations are required to calculate the address of hex string.
mov %rsp, %rax addval_190
mov %rax, $rdi setval_426
pop %rax getval_280
mov %eax, %edx addval_487
mov %edx, %ecx getval_159
mov %ecx, %esi addval_187
leaq (%rdi,%rsi,1), %rax add_xy
mov %rax, %rdi setval_426
#pop operations = 8 (#instructions) + 1 (pop %rax) = 9.
Optimize performance with undertandings of memory hierarchy.
- parse the command line arguments using
getopt
function For example, we have
> ./csim -h -v -s 4 -E 1 -b 2 -t example.txt
int opt, s, E, b;
while ((opt = getopt(argc, argv, "hvs:E:b:t:")) != -1) {
switch (opt) {
case 's':
s = atoi(optarg);
break;
case 'E':
E = atoi(optarg);
break;
case 'b':
b = atoi(optarg);
break;
case 't':
file = optarg;
break;
default:
printf("Unknown option: %c", optopt);
exit(1);
}
}
fscanf
function is used to parse the words in the text.- Extract the set index and tag bits from address (using mask).
- Each set can be implemented by a linked-list, where every node is a line.
Linked-list is efficient with add_head
and remove_node
operations:
- LRU (least-recently used) policy corresponds to the order in the linked-list. The most recently used block can be always placed in the head.
- Remove the tail node when the number of blocks is larger than E.
cache:
- 32 x 32 (Every 8 rows share the same set index)
basic idea: 8x8 sub-blocks + read 8 ints in a row of A, write 8 ints in a column of B enhancement: There will be extra conflict misses when read and write entries on the diagonal line, so we can use 8 int variables to save the row to registers or temporarily skip the assignments in the diag. misses: 288 < 300
- 64 x 64 (Every 4 rows share the same set index)
basic idea: similar to 32x32 case, divide the matrix into multiple 4x4 sub-blocks. Because of 8 ints in a set of cache, it is not efficient enough to use just block-wise transposes. enhancement: Transpose with 8x8 block with 4 4x4 sub-blocks:
---- ----
2 1
---- ----
3 4
---- ----
- Transpose the 2nd block first, leave 1st block transposed in the cache
- Transpose the 1st block while writing the previous entries of 1st block to 3rd block.
- Read a column from A's 3rd block to 4 registers
- Read a row from B's 1st block to the other 4 registers
- Write the column from A to the corresponding row in B's 1st block
- Write the row from B's 1st block to its 3rd block
- Transpose the 4th block
keys: Make full use of the cache from the previous step. misses: 1180 < 1300
- M = 61, N = 67 (Few conflict misses for no alignments)
basic idea: directly apply 8x8 blocks transpose misses: 1994 < 2000
Understand exception control flow by building a tiny shell.
A shell is an interactive command-line interpreter that runs programs on behalf of the user. A shell repeatedly prints a prompt, waits for a command line on stdin, and then carries out some action, as directed by the contents of the command line. The first word in the command line is either the name of a built-in command or the pathname of an executable file. The remaining words are command-line arguments.
Here are the ideas on how to implement the functions and build a process management system:
-
eval()
function:- parse the command line: break apart the whole line into pieces of arguments + decide fg or bg ?
- call
builtin_cmd()
: do all things and return if true, otherwise passes control to the process management part. - process management part: now the input is an executable pathname, call
fork()
andexecve()
to create a child process.
- foreground:
eval()
returns after the child is terminated, then callwait_fg()
to wait for it. - background: no need to wait, continue the parent process.
-
builtin_cmd()
function: just like basic procedures, it is called inside the shell process. if the command is:- "quit": just terminate the parent process (unreaped children will be handled by
init
process) - "jobs": call
listjobs()
safely (temporarily block all signals to avoid corruption) - "fg" or "bg": call
do_fgbg()
- other: error case, throw an error by emitting a message to stdout.
- "quit": just terminate the parent process (unreaped children will be handled by
-
do_fgbg()
function: continue the specified process.- parse the second argument, determine the actual process/job it refers to.
- three cases available: all send a
SIGCONT
signal first
- from
ST
toFG
: state change in the jobs list + waiting - from
BG
toFG
: state change in the jobs list + waiting (the running process will neglect theSIGCONT
signal by default) - from
ST
toBG
: only need to change the state
-
sigchld_handler
: the only way to reap the terminated or stopped child process The child process sends aSIGCHLD
signal to parent if it has stopped or terminated. Here are 3 cases:- terminate normally (
WIFEXITED(status) == 1
) - terminated by receiving a
SIGINT
signal, either from other processes orsigint_handler
- suspended by receiving a
SIGTSTP
signal, either from other processes orsigtstp_handler
The handler need to change to the proper state or delete the job from list. (ensure all
addjob()
are called beforedeletejob()
, by blockingSIGCHLD
signal just beforefork()
) The option isWNOHANG | WUNTRACED
to catch the pid of processes without waiting. - terminate normally (
-
wait_fg()
: The key iswait_fg
return after the fg process is reaped The foreground process is reaped in thesigchld_handler
, so do we determine whether it has been reaped? Claim: As long as the handler behaves correctly, the reaping activity is immediately followed bydeletejob()
. So when the job is no longer in jobs list, we can assume it has been reaped and then return.struct job_t* target_job = getjobpid(jobs, fg_pid); // block all signals temporarily while (target_job && target_job->state == FG) sleep(1);
But we cannot determine the order of
sigchld_handler
andwait_fg
, so the target job may be NULL (if handler comes first) or untiljob.state != FG
(we get the job pointer first, but need to wait for the state change). -
sigint_handler
andsigtstp_handler
: catch theSIGINT
orSIGTSTP
signal from keyboard (<Ctrl-c> or <Ctrl-z>), and pass the corresponding signal to the fg process group.
Implement a dynamic memory allocator with high utility and throughput.
In this approach, malloc package uses segregated double-linked list to record the current free blocks, which adopts the LIFO finding rule and coalesce principle.
-
Init: Extend heap by calling sbrk(). Initialize a number of head nodes in the segregated list, all free-lists share the same tail node (tail.prev doesn't matter) Initialize prologue and epilogue blocks and mark as allocated (reduce corner cases)
NOTE: The heap extends on demand (malloc, realloc). Therefore, the program doesn't allocate space at here.
-
Malloc: Find a free block at specific size-list, if not found, go to the next list with larger size, If after traversing lists and still fail to find a block to hold the required 'size' data, extend heap for 'size' bytes. When the block is found, delete this free block from seglist and place the block (mark as allocated + split)
-
Free: Unmark the allocated block + coalesce + add to the corresponding list
-
Realloc:
- new block size < old block size: shrink the current allocated block by calling place()
- new block size >= old block size + the next block on heap is free + the total size satisfy the requirement: merge the two blocks and place (next expansion strategy)
- new block size >= old block size + the next block is epilogue: extend heap for the exceeded bytes, change size of header and footer
- otherwise, simply calling free and malloc for new size
block structure
bp
free block | header: size_t 4 | prev: ptr 4 | next: ptr 4 | payload 8*N-8 | footer size_t 4|
bp
alloc block | header: size_t 4 | payload 8*N | footer size_t 4|
where 8*N
is the alignment of space that user wants to allocate.
segregated list
[index] block size (bytes)
[0] 2^4 = {16 - 24}
[1] 2^5 = {32 - 56}
[2] 2^6 = {64 - 120}
[3] 2^7 = {128 - 248}
...
[7] 2^11 = {2048 - 4088}
[8] >= 2^12
Build a tiny concurrent proxy with caches.
The proxy receives a positive integer from command line, use it as listening port to perform on all requests.
It's easy to build a sequential proxy based on the Tiny Server from textbook.
basic idea: The main thread parses the connection request, fetches response from the Web server and resends it back to client.
In part I, the core problem is to parse the request and headers successfuly:
- Read the first line as HTTP request
$METHOD $URI HTTP/$VERSION
. Validate the method and version. - Parse the URI
http://hostname:port/path
to hostname, port number and path. If no explicit port exists, use 80 as port number. - Read the remaining lines as headers until
'\r\n'
. - Connect to the server based on hostname and port.
- Rebuild the HTTP request
GET $PATH HTTP/1.0
and other important headers. Send them to the server socket.
The proxy uses shared buffer (sbuf.h
, sbuf.c
) and prethreading from textbook.
main thread: accept all connections from the listening port and add the fd to buffer.
other threads: get a fd from the buffer and perform all operations in part I.
The cache (see cache.h
, cache.c
) uses a doubly-linked list to manage multiple lines:
typedef struct line {
char *uri; // tag to identify the content
char *data; // entire response from the server last time
size_t size; // the size of data
struct line *prev, *next; // pointers to the other line
}
Some clients fetch responses that have been cached (case 1), so that they only call const methods which are thread-safe for no writing to the cache structure.
On the contrary, the others will update the internal structure of cache lines by calling non-const methods, which means the lines they request haven't been cached (case 2).
How to synchronize?
According to the relationship between requests and reading roles, we use the first readers-writers mechanism (multiple readers but only one writer can access the cache) to synchronize and protect the shared cache from multiple threads.
- case 1: pure reader
- lock before finding the line from buffer successfully, unlock after copying the data to private local buffer.
- case 2: partially reader + writer
- lock before finding the line it wants, and unlock after failing (as reader).
- lock before adding the data to cache and unlock after finishing adding (as real writer).
How to determine the eviction policy?
If we use the strict LRU policy. Every read and write operation will update the order of lines:
- read: rearrange the line to the head of list.
- write: create a new line and insert into the head.
So there is no const method for multiple readers to be served simultaneously.
Instead, we use an approximate LRU policy that don't update the order in reading operation and leave the line untouched.
Therefore, the find_line
method is safe for interleaves by multiple threads.