陳奕嘉
Merge sort is a basic sorting algorithm. It is a sorting method that belongs to the Devide and Conquer method. Its characteristics are stable sorting algorithm and the time complexity is O(nlogn).
Merge Sort works by dividing the sequence to be sorted into smaller subsequences, sorting each subsequence, and finally merging them to generate a complete sorting result.
Top-Down : Use recursion to continuously divide the array until each subsequence has only one element, and then merge these subsequences from small to large to gradually build the sorted array.
Bottom-up : Without recursion, start sorting from the smallest subsequence and gradually build larger sorted subsequences until the entire array is sorted.
1.Memory Access Pattern:
2.Memory Overhead from Recursion:
3.Merge Process:
1.Memory Access Pattern:
The Bottom-Up approach starts by merging subarrays of size 1 and gradually increases the size (e.g., 2, 4, 8, …).
Memory access patterns are typically sequential, making it more cache-efficient and easier to leverage prefetching mechanisms.
3.Merge Process:
1.Sequential Memory Access:
2.Cache Prefetching:
3.Recursion Overhead:
4.Data Reuse:
Feature | Top-Down Approach | Bottom-Up Approach |
---|---|---|
Memory Access Pattern | Irregular, scattered, higher cache miss rate | Sequential, contiguous, lower cache miss rate |
Cache Prefetching | Inefficient due to unpredictable patterns | Efficient due to sequential access |
Recursion Overhead | Requires stack memory, increases cache load | No recursion, reduces cache pressure |
Data Reuse | Low, due to non-contiguous subarray access | High, due to contiguous subarray access |
Best Use Cases | Small arrays or scattered data | Large arrays or high-performance requirements |
#include<stdio.h>
#include<stdlib.h>
void merge(int *arr, int start, int mid, int end)
{
int total = end - start + 1;
//int *array = (int *)malloc(sizeof(int)*total);
int array[total]; // variable array length stored in stack
for(int i = 0; i< total;i++)
array[i] = arr[i+start];
int left_c = 0;
int right_c = mid-start+1;
int l_max = mid-start;
int r_max = end-start;
int output_c = start;
while(left_c <= l_max && right_c <= r_max)
{
if(array[left_c] <= array[right_c])
{
arr[output_c] = array[left_c];
output_c++;
left_c++;
}
else
{
arr[output_c] = array[right_c];
output_c++;
right_c++;
}
}
if(left_c <= l_max)
{
while(left_c <= l_max)
{
arr[output_c] = array[left_c];
output_c++;
left_c++;
}
}
else
{
while(right_c <= r_max)
{
arr[output_c] = array[right_c];
output_c++;
right_c++;
}
}
}
void mergesort(int *arr, int start, int end)
{
if(start<end)
{
int mid = (end+start)/2;
mergesort(arr, start, mid);
mergesort(arr, mid+1, end);
merge(arr, start, mid, end);
}
}
int main()
{
int arr[] = {7, -4, 10, -1, 3, 2, -6, 9};
int size = 8;
printf("Before Sort : ");
for(int i = 0;i<8;i++)
printf("%d , ", arr[i]);
printf("\n");
mergesort(arr, 0, size-1);
for(int i = 0;i<8;i++)
printf("%d , ", arr[i]);
printf("\n");
return 0;
}
merge
function:
1.Create a temporary array array
to store elements from start
to end
(to avoid directly modifying the original array, making merging easier).
2.left_c
and right_c
to track the current positions in the left and right subarrays, respectively.l_max
and r_max
to indicate the maximum indexes of the left and right subarrays.
3.If the element in the left subarray is smaller or equal to the element in the right subarray, copy it back to the original array and move the left_c
pointer.Otherwise, copy the element from the right subarray and move the right_c pointer.
4.After finishing comparisons, copy any remaining elements from the left or right subarray back to the original array.
mergesort
function:
1. If start < end
mid = (start + end) / 2
.mergesort(arr, start, mid)
to sort the left subarray.mergesort(arr, mid + 1, end)
to sort the right subarray.merge
function.start >= end
, the subarray contains only one element (which is inherently sorted), so the recursion stops.
.data
size: .word 8
arr: .word 7, 4, 10, -1, 3, 2, -6, 9
str1: .string " Before Sort : "
str2: .string " After Sort : "
newline: .string "\n"
comma: .string " , "
.text
############## main ################
main:
addi sp, sp, -4
sw ra, 0(sp)
# print str1
la a0, str1
addi a7, x0, 4
ecall
# print current array
la a0, arr
lw a1, size
jal ra, printArray
# do MergeSort(&arr, int start, int end)
la a0, arr
addi a1, x0, 0
lw a2, size
addi a2, a2, -1
jal ra, mergeSort
# print str2
la a0, str2
addi a7, x0, 4
ecall
# print current array
la a0, arr
lw a1, size
jal ra, printArray
lw ra, 0(sp)
addi sp, sp, 4
addi a7, x0, 10
ecall
jalr x0, ra, 0
###########################################
############ mergeSort ##############
# s0 : &addr
# s1 : start
# s2 : end
# s3 : mid = (start+end)/2
mergeSort:
# sp : ra
# sp+4 : &addr
# sp+8 : start
# sp+12 : end
# sp+16 : mid
addi sp, sp, -20
sw ra, 0(sp)
# copy argument to saved register
addi s0, a0, 0 # s0 : &addr
addi s1, a1, 0 # s1 : start
addi s2, a2, 0 # s2 : end
# if(start<end), else go to label(endMergeSort)
bge s1, s2, endMergeSort
# use substract to implement divide
# s3 <= mid = (start+end)/2
add s3, s1, s2 # mid(s3) = start(s1) + end(s2)
addi s4, x0, 0 # (s4+1) each time when (s3-2) until s3 < 0
div:
blt s3, x0, endDiv
addi s3, s3, -2 # each time s3 - 2
addi s4, s4, 1 # each time s4 + 1
jal x0, div
endDiv:
addi s3, s4, -1 # s3 = mid : floor((start+end)/2)
# save &addr(s0), start(s1), end(s2), mid(s3) to mem
sw s0, 4(sp)
sw s1, 8(sp)
sw s2, 12(sp)
sw s3, 16(sp)
# prepare argument to do left mergeSort
# mergeSort(&arr(a0), start(a1), mid(a2))
addi a0, s0, 0 # a0 <= &addr (s0)
addi a1, s1, 0 # a1 <= start (s1)
addi a2, s3, 0 # a2 <= mid (s3)
jal ra, mergeSort
# prepare argument to do right mergeSort
# mergeSort(&arr(a0), mid(a1)+1, end(a2))
lw a0, 4(sp) # a0 <= &addr (sp+4)
lw a1, 16(sp) # a1 <= mid (sp+16)
addi a1, a1, 1 # a1 <= mid + 1
lw a2, 12(sp) # a2 <= end (sp+12)
jal ra, mergeSort
# prepare argument to do merge
# mergeSort(&arr(a0), start(a1), mid(a2), end(a3))
lw a0, 4(sp) # a0 <= &addr (sp+4)
lw a1, 8(sp) # a1 <= start (sp+8)
lw a2, 16(sp) # a2 <= mid (sp+16)
lw a3, 12(sp) # a3 <= end (sp+12)
jal ra, merge
endMergeSort:
lw ra, 0(sp)
addi sp, sp, 20
jalr x0, ra, 0
###########################################
########### merge ################
# a0 : &addr
# a1 : start
# a2 : mid
# a3 : end
merge:
# 0. we have left sorted array(start..mid) & right sorted array(mid+1..end)
# 1. copy array[start..end] from memory to stack
# 2. use 2 index (left: from 0, right : from mid-start+1) to compare left & right in stack
# 3. write the smaller back to memory, and the smaller index + 1
# 4. check is there any index have reached(>) the max index(left: mid-start, right: end-start)
# 5. if no, go back to step 2; else write the left/right left value back to memory
# calculate how many stack space we need
# t0 : total count of index => end - start + 1
# t1 : total space of memsize => t0*4
# t2 : 2's complement of t1 => -t1
sub t0, a3, a1 # total(t0) = end(a3) - start(a1)
addi t0, t0, 1 # total(t0) = end(a3) - start(a1) + 1
add t1, t0, t0 # space(t1) = total(t0)*2
add t1, t1, t1 # space(t1) = space(t1)*2 = total(t0)*4
xori t2, t1, 0xffffffff # 1's complement + -> -, t1 -> t2
addi t2, t2, 1 # 2's complement(t2) => 1's complement(t2) + 1
add sp, sp, t2
# copy array[start..end] from memory to stack
# t2 : current index of new array(stack)
# t3 : current index of old array(memory)
addi t3, a1, 0 # memory index(t3) start from start(a1)
addi t2, x0, 0 # stack index(t2) start from 0
read2Stack:
blt a3, t3, endRead # if(mem_index(t3) < end(a3)), else go to label(endRead)
add t4, t3, t3 # offset_space(t4) = mem_index(t3)*2
add t4, t4, t4 # offset_space(t4) = offset_space(t4)*2 = mem_index(t3)*4
add t4, a0, t4 # mem_addr(t4) = &addr(a0) + offset_space(t4)
lw t5, 0(t4) # t5 = *mem_addr(t4)
add t6, t2, t2 # offset_space(t6) = st_index(t2)*2
add t6, t6, t6 # offset_space(t6) = offset_space(t6)*2 = st_index(t2)*4
add t6, sp ,t6 # st_addr(t6) = initial_addr(sp) + offset_space(t6)
sw t5, 0(t6) # *st_addr(t6) = t5
addi t2, t2, 1 # st_index + 1
addi t3, t3, 1 # mem_index + 1
jal x0, read2Stack
endRead:
# do merge
# view stack as two parts(left sorted & right sorted), compare each, then store to memory from little to big
# prepare some variable
# t2 : left part index => 0
# t3 : right part index => mid(a2) - start(a1) + 1 => t4 + 1
# t4 : left terminate index(when t2 > t4) => mid(a2) - start(a1)
# t5 : right terminate index(when t3 > t5) => end(a3) - start(a1)
# t6 : write back index of old array(memory) => start(a1)
sub t4, a2, a1 # left max(t4) = mid(a2) - start(a1)
sub t5, a3, a1 # right max(t5) = end(a3) - start(a1)
addi t2, x0, 0 # left index(t2) = start form 0
addi t3, t4, 1 # right index(t3) = start from mid(a2) - start(a1) + 1
addi t6, a1, 0 # t6 : write back index from start(a1)
# if((left index <= left terminate) && (right index <= right terminate)), else go to label(endMergeLoop)
# t0 : boolean <= (left terminate(t4) < left index(t2))
# t1 : boolean <= (right terminate(t5) < right index(t3))
# if t0 or t1 is true, it will be go to go to label(endMergeLoop)
# if neither => go on
# t0 : calculate the left mem, then load value in it
# t1 : calculate the right mem, then load value in it
# compare which is smaller, then save the smaller in old array(memory)
mergeLoop:
slt t0, t4, t2 # t0 = (left terminate(t4) < left index(t2))
slt t1, t5, t3 # t1 = (right terminate(t5) < right index(t3))
or t0, t0, t1 # t0 = t0 || t1
xori t0, t0, 0x1 # t0 = ~t0
beq t0, x0, endMergeLoop # if((t0||t1) != 0), else go to label(endMergeLoop)
add t0, t2, t2 # offset_space(t0) = left_index(t2)*2
add t0, t0, t0 # offset_space(t0) = offset_space(t0)*2 = left_index(t2)*4
add t0, sp, t0 # left_addr(t0) = initial_addr(sp) + offset_space(t0)
lw t0, 0(t0) # left_value(t0) = *left_addr(t0)
add t1, t3, t3 # offset_space(t1) = right_index(t3)*2
add t1, t1, t1 # offset_space(t1) = offset_space(t1)*2 = right_index(t3)*4
add t1, sp, t1 # right_addr(t1) = initial_addr(sp) + offset_space(t1)
lw t1, 0(t1) # right_value(t1) = *right_addr(t1)
blt t1, t0, rightSmaller # if(left_value(t0)<=right_value(t1)), else go to label(rightSmaller)
add t1, t6, t6 # offset_space(t1) = mem_index(t6)*2
add t1, t1, t1 # offset_space(t1) = offset_space(t1)*2 = mem_index(t6)*4
add t1, a0, t1 # mem_addr(t1) = &addr(a0) + offset_space(t1)
sw t0, 0(t1) # *mem_addr(t1) = left_value(t0)
addi t6, t6, 1 # mem_index + 1
addi t2, t2, 1 # left_index + 1
jal x0, mergeLoop
rightSmaller:
add t0, t6, t6 # offset_space(t0) = mem_index(t6)*2
add t0, t0, t0 # offset_space(t0) = offset_space(t0)*2 = mem_index(t6)*4
add t0, a0, t0 # mem_addr(t0) = &addr(a0) + offset_space(t0)
sw t1, 0(t0) # *mem_addr(t0) = right_value(t1)
addi t6, t6, 1 # mem_index + 1
addi t3, t3, 1 # right_index + 1
jal x0, mergeLoop
endMergeLoop:
# if left part still have elements, store leave in to old array
# if right part still have elements, store leave in to old array
bge t5, t3, rightLoop # if(right index <= right terminate) => right part still have elements, go to label(rightLoop)
leftLoop: # else go on label(leftLoop)
add t0, t2, t2 # offset_space(t0) = left_index(t2)*2
add t0, t0, t0 # offset_space(t0) = offset_space(t0)*2 = left_index(t2)*4
add t0, sp, t0 # left_addr(t0) = initial_addr(sp) + offset_space(t0)
lw t0, 0(t0) # left_value(t0) = *left_addr(t0)
add t1, t6, t6 # offset_space(t1) = mem_index(t6)*2
add t1, t1, t1 # offset_space(t1) = offset_space(t1)*2 = mem_index(t6)*4
add t1, a0, t1 # mem_addr(t1) = &addr(a0) + offset_space(t1)
sw t0, 0(t1) # *mem_addr(t1) = left_value(t0)
addi t6, t6, 1 # mem_index + 1
addi t2, t2, 1 # left_index + 1
bge t4, t2, leftLoop # check left part whether still have elements
jal x0, endMerge
rightLoop:
add t1, t3, t3 # offset_space(t1) = right_index(t3)*2
add t1, t1, t1 # offset_space(t1) = offset_space(t1)*2 = right_index(t3)*4
add t1, sp, t1 # right_addr(t1) = initial_addr(sp) + offset_space(t1)
lw t1, 0(t1) # right_value(t1) = *right_addr(t1)
add t0, t6, t6 # offset_space(t0) = mem_index(t6)*2
add t0, t0, t0 # offset_space(t0) = offset_space(t0)*2 = mem_index(t6)*4
add t0, a0, t0 # mem_addr(t0) = &addr(a0) + offset_space(t0)
sw t1, 0(t0) # *mem_addr(t0) = right_value(t1)
addi t6, t6, 1 # mem_ndex + 1
addi t3, t3, 1 # right_index + 1
bge t5, t3, rightLoop # check right part whether still have elements
jal x0, endMerge
# revert stack space
# t0 : total count of index => end - start + 1
# t1 : total space of memsize => t0*4
endMerge:
sub t0, a3, a1 # total(t0) = end(a3) - start(a1)
addi t0, t0, 1 # total(t0) = end(a3) - start(a1) + 1
add t1, t0, t0 # space(t1) = total(t0)*2
add t1, t1, t1 # space(t1) = space(t1)*2 = total(t0)*4
add sp, sp, t1
jalr x0, ra, 0
###########################################
########### printArray ###############
printArray:
addi t0, a0, 0
addi t1, a1, 0
addi t2, x0, 0
printLoop:
add t3, t2, t2 # i+=i
add t3, t3, t3 # i+=i ==> i*4
add t3, t0, t3
lw a0, 0(t3)
addi a7, x0, 1
ecall
addi t2, t2, 1
bge t2, t1, endPrint
la a0, comma
addi a7, x0, 4
ecall
jal x0, printLoop
endPrint:
la a0, newline
addi a7, x0, 4
ecall
jalr x0, ra, 0
###########################################
mergeSort
Function:start >= end
, the recursion ends, as the subarray is already sorted. The midpoint (mid
) is calculated using a simulated division operation. After splitting, the left and right subarrays are recursively sorted, and then the merge
function is called to combine the two sorted subarrays.merge
Functionmerge
function is responsible for merging two sorted subarrays into a single sorted array. It begins by copying the data within the start
to end
range into the stack for temporary storage. Then, it uses two pointers to traverse the left and right subarrays, comparing the current elements from each subarray. The smaller element is written back to the original array, and the corresponding pointer is moved forward. If one subarray is exhausted, the remaining elements of the other subarray are directly copied back into the original array. After merging, the allocated stack space is released.
.data
size: .word 8
arr: .word 7, 4, 10, -1, 3, 2, -6, 9
str1: .string " Before Sort : "
str2: .string " After Sort : "
newline: .string "\n"
comma: .string " , "
.text
main:
addi sp, sp, -4
sw ra, 0(sp)
# print str1
la a0, str1
addi a7, x0, 4
ecall
# print current array
la a0, arr
lw a1, size
jal ra, printArray
# do iterative MergeSort
la a0, arr
lw a1, size
jal ra, iterativeMergeSort
# print str2
la a0, str2
addi a7, x0, 4
ecall
# print sorted array
la a0, arr
lw a1, size
jal ra, printArray
lw ra, 0(sp)
addi sp, sp, 4
addi a7, x0, 10
ecall
jalr x0, ra, 0
# Iterative MergeSort
# a0: array address
# a1: array size
iterativeMergeSort:
addi sp, sp, -24
sw ra, 0(sp)
sw s0, 4(sp) # array address
sw s1, 8(sp) # size
sw s2, 12(sp) # current width
sw s3, 16(sp) # left start
sw s4, 20(sp) # iteration counter
mv s0, a0 # save array address
mv s1, a1 # save size
# Start with merging subarrays of size 1
addi s2, x0, 1 # width starts from 1
width_loop:
bge s2, s1, end_sort # if width >= size, we're done
# For each width, merge subarrays
addi s3, x0, 0 # left = 0
left_loop:
# Calculate right start and end
add t0, s3, s2 # mid = left + width
add t1, t0, s2 # right_end = mid + width
# Ensure right_end doesn't exceed array size
bge t1, s1, adjust_end
j do_merge
adjust_end:
mv t1, s1 # right_end = size
do_merge:
# If there are elements to merge
bge s3, t0, next_iteration
# Prepare arguments for merge
mv a0, s0 # array address
mv a1, s3 # left start
addi a2, t0, -1 # mid - 1
addi a3, t1, -1 # right_end - 1
jal ra, merge
next_iteration:
add s3, s3, s2 # left += width
add s3, s3, s2 # left += width (total: left += 2*width)
blt s3, s1, left_loop
# Double the width for next iteration
add s2, s2, s2 # width *= 2
j width_loop
end_sort:
lw ra, 0(sp)
lw s0, 4(sp)
lw s1, 8(sp)
lw s2, 12(sp)
lw s3, 16(sp)
lw s4, 20(sp)
addi sp, sp, 24
jalr x0, ra, 0
# Merge function (same as original)
merge:
sub t0, a3, a1
addi t0, t0, 1
add t1, t0, t0
add t1, t1, t1
xori t2, t1, 0xffffffff
addi t2, t2, 1
add sp, sp, t2
addi t3, a1, 0
addi t2, x0, 0
read2Stack:
blt a3, t3, endRead
add t4, t3, t3
add t4, t4, t4
add t4, a0, t4
lw t5, 0(t4)
add t6, t2, t2
add t6, t6, t6
add t6, sp, t6
sw t5, 0(t6)
addi t2, t2, 1
addi t3, t3, 1
j read2Stack
endRead:
sub t4, a2, a1
sub t5, a3, a1
addi t2, x0, 0
addi t3, t4, 1
addi t6, a1, 0
mergeLoop:
slt t0, t4, t2
slt t1, t5, t3
or t0, t0, t1
xori t0, t0, 0x1
beq t0, x0, endMergeLoop
add t0, t2, t2
add t0, t0, t0
add t0, sp, t0
lw t0, 0(t0)
add t1, t3, t3
add t1, t1, t1
add t1, sp, t1
lw t1, 0(t1)
blt t1, t0, rightSmaller
add t1, t6, t6
add t1, t1, t1
add t1, a0, t1
sw t0, 0(t1)
addi t6, t6, 1
addi t2, t2, 1
j mergeLoop
rightSmaller:
add t0, t6, t6
add t0, t0, t0
add t0, a0, t0
sw t1, 0(t0)
addi t6, t6, 1
addi t3, t3, 1
j mergeLoop
endMergeLoop:
bge t5, t3, rightLoop
leftLoop:
add t0, t2, t2
add t0, t0, t0
add t0, sp, t0
lw t0, 0(t0)
add t1, t6, t6
add t1, t1, t1
add t1, a0, t1
sw t0, 0(t1)
addi t6, t6, 1
addi t2, t2, 1
bge t4, t2, leftLoop
j endMerge
rightLoop:
add t1, t3, t3
add t1, t1, t1
add t1, sp, t1
lw t1, 0(t1)
add t0, t6, t6
add t0, t0, t0
add t0, a0, t0
sw t1, 0(t0)
addi t6, t6, 1
addi t3, t3, 1
bge t5, t3, rightLoop
endMerge:
sub t0, a3, a1
addi t0, t0, 1
add t1, t0, t0
add t1, t1, t1
add sp, sp, t1
jalr x0, ra, 0
# Print Array function (same as original)
printArray:
addi t0, a0, 0
addi t1, a1, 0
addi t2, x0, 0
printLoop:
add t3, t2, t2
add t3, t3, t3
add t3, t0, t3
lw a0, 0(t3)
addi a7, x0, 1
ecall
addi t2, t2, 1
bge t2, t1, endPrint
la a0, comma
addi a7, x0, 4
ecall
j printLoop
endPrint:
la a0, newline
addi a7, x0, 4
ecall
jalr x0, ra, 0
iterativeMergeSort
function:iterativeMergeSort
lies in simulating the recursive divide-and-conquer process by using loops to gradually merge subarrays. Initially, the size of each subarray is set to 1, meaning it starts by merging single-element subarrays. As the iterations progress, the size of the subarrays doubles—for example, from size 1 to size 2, then to size 4, and so on—until the entire array is completely sorted. In each iteration, the program calculates the left and right boundaries of the current subarrays to be processed, ensuring the boundaries do not exceed the array's size. It then calls the merge
function to combine the left and right subarrays into a single sorted subarray.merge
function:Top-Down | Bottoom up | |
---|---|---|
Cycles | 2032 | 1616 |
Instrs. retired | 1564 | 1280 |
CPI | 1.3 | 1.26 |
IPC | 0.77 | 0.792 |
Repl.policy | LRU |
Wr. hit | Write-back |
Wr. miss | Write allocate |
Top-Down | Bottoom up | |
---|---|---|
Hit rate | 0.9627 | 0.9521 |
Hits | 232 | 139 |
Misses | 9 | 7 |
Depth-First Search (DFS) is a classic algorithm for traversing a graph or tree. Its key characteristic is to explore as deeply as possible along one branch before backtracking to explore other unvisited branches. Recursive implementation is a common and natural way to implement DFS, as it leverages the function call stack to handle backtracking.
1. Stack Overflow
2.Inefficient Memory Usage
#include <stdio.h>
#include <stdlib.h>
// Node structure for adjacency list
struct Node {
int dest;
struct Node* next;
};
// Structure to represent an adjacency list
struct AdjList {
struct Node* head;
};
// Function to create a new adjacency list node
struct Node* createNode(int dest) {
struct Node* newNode =
(struct Node*)malloc(sizeof(struct Node));
newNode->dest = dest;
newNode->next = NULL;
return newNode;
}
// Recursive function for DFS traversal
void DFSRec(struct AdjList adj[], int visited[], int s) {
// Mark the current vertex as visited
visited[s] = 1;
// Print the current vertex
printf("%d ", s);
// Traverse all adjacent vertices that are not visited yet
struct Node* current = adj[s].head;
while (current != NULL) {
int dest = current->dest;
if (!visited[dest]) {
DFSRec(adj, visited, dest);
}
current = current->next;
}
}
// Main DFS function that initializes the visited array
// and calls DFSRec
void DFS(struct AdjList adj[], int V, int s) {
// Initialize visited array to false
int visited[5] = {0};
DFSRec(adj, visited, s);
}
// Function to add an edge to the adjacency list
void addEdge(struct AdjList adj[], int s, int t) {
// Add edge from s to t
struct Node* newNode = createNode(t);
newNode->next = adj[s].head;
adj[s].head = newNode;
// Due to undirected Graph
newNode = createNode(s);
newNode->next = adj[t].head;
adj[t].head = newNode;
}
int main() {
int V = 5;
// Create an array of adjacency lists
struct AdjList adj[V];
// Initialize each adjacency list as empty
for (int i = 0; i < V; i++) {
adj[i].head = NULL;
}
int E = 5;
// Define the edges of the graph
int edges[][2] = {{1, 2}, {1, 0}, {2, 0}, {2, 3}, {2, 4}};
// Populate the adjacency list with edges
for (int i = 0; i < E; i++) {
addEdge(adj, edges[i][0], edges[i][1]);
}
int source = 1;
printf("DFS from source: %d\n", source);
DFS(adj, V, source);
return 0;
}
createNode
function:
The createNode
function is responsible for creating a new node. It takes the destination node number dest
as a parameter, allocates memory for a new Node
structure, sets dest
as the value of the node, and initializes its next
pointer to NULL
. Finally, it returns the pointer to the newly created node.
DFSRec
function:
The DFSRec
function implements the recursive depth-first search. Starting from the given node s
, it marks the node as visited and prints it. Then, it iterates through all the neighbors of node s
. For each unvisited neighbor, it recursively calls DFSRec
until all reachable nodes from s
have been visited.
DFS
function:
The DFS
function is the entry point for the depth-first search. It initializes a visited array visited
to track which nodes have been visited, and then calls the recursive DFSRec
function, starting the traversal from the specified source node s
.
.data
# 5x5 adjacency matrix for the graph
adj_matrix:
.word 0,1,1,0,0 # node 0's connections
.word 1,0,1,0,0 # node 1's connections
.word 1,1,0,1,1 # node 2's connections
.word 0,0,1,0,0 # node 3's connections
.word 0,0,1,0,0 # node 4's connections
visited: .word 0,0,0,0,0 # visited array
msg1: .string "DFS starting from node: "
arrow: .string " -> "
newline: .string "\n"
.text
.global main
main:
# Print start message
la a0, msg1
li a7, 4
ecall
# Print starting node (1)
li a0, 1
li a7, 1
ecall
la a0, newline
li a7, 4
ecall
# Initialize visited array
la t0, visited
sw zero, 0(t0)
sw zero, 4(t0)
sw zero, 8(t0)
sw zero, 12(t0)
sw zero, 16(t0)
# Start DFS from node 1
li a0, 1
jal dfs
# Print final newline
la a0, newline
li a7, 4
ecall
# Exit program
li a7, 10
ecall
# DFS function
dfs:
# Save return address and s0-s2
addi sp, sp, -16
sw ra, 12(sp)
sw s0, 8(sp)
sw s1, 4(sp)
sw s2, 0(sp)
# Save current node in s0
mv s0, a0
# Check if already visited
la t0, visited
slli t1, s0, 2
add t1, t0, t1
lw t2, 0(t1)
bnez t2, dfs_return # if already visited, return
# Mark as visited
li t2, 1
sw t2, 0(t1)
# Print current node
mv a0, s0
li a7, 1
ecall
# Set up for neighbor checking
mv s1, zero # Initialize neighbor counter
li s2, 0 # Flag to track if we found any unvisited neighbors
check_neighbors:
li t0, 5
beq s1, t0, after_neighbors # If checked all neighbors, go to after_neighbors
# Calculate address of current edge in adjacency matrix
la t0, adj_matrix
li t1, 20 # 5 nodes * 4 bytes per row
mul t1, s0, t1 # Get row offset
add t0, t0, t1 # Get row address
slli t1, s1, 2 # Get column offset
add t0, t0, t1 # Get edge address
lw t1, 0(t0) # Load edge value
# If no edge to this neighbor, skip
beqz t1, next_neighbor
# Check if neighbor is unvisited
la t0, visited
slli t1, s1, 2
add t0, t0, t1
lw t1, 0(t0)
bnez t1, next_neighbor # If visited, skip
# Print arrow before visiting new node
la a0, arrow
li a7, 4
ecall
# Visit neighbor through recursive call
mv a0, s1
li s2, 1 # Set flag indicating we found an unvisited neighbor
jal dfs
next_neighbor:
addi s1, s1, 1 # Increment neighbor counter
j check_neighbors
after_neighbors:
beqz s2, dfs_return # If no unvisited neighbors were found, return without printing arrow
dfs_return:
# Restore saved registers
lw s2, 0(sp)
lw s1, 4(sp)
lw s0, 8(sp)
lw ra, 12(sp)
addi sp, sp, 16
ret
main
function:
The main
function serves as the program's entry point and is responsible for overall control flow. It starts by printing an initial message and initializing the visited
array to mark all nodes as unvisited. Then, it begins the depth-first search (DFS) traversal by calling the dfs
function starting from node 1. After the traversal is complete, the main
function prints a newline and terminates the program using a system call.
dfs
function:
The dfs
function implements the recursive depth-first search. Upon entering the function, it saves the current node and register states onto the stack to preserve the context during recursive calls. It then checks if the current node has already been visited. If not, it marks the node as visited and prints its value. The function proceeds to iterate over the neighbors of the current node. For any unvisited neighbor, it recursively calls dfs
to continue the traversal. Once all neighbors of the current node are processed, the function restores the saved register states and returns to the previous level.
Neighbor Checking Logic:
The neighbor-checking logic, embedded within the dfs
function, is responsible for iterating over all neighbors of the current node. Using the adjacency matrix, the function calculates each neighbor's position and checks if there is an edge connection (matrix value of 1) and whether the neighbor is unvisited. If both conditions are met, the program outputs an arrow symbol to indicate a connection and recursively calls dfs
to traverse that neighbor. Once all neighbors are processed, the checking logic ends, and dfs
starts to backtrack.
.data
# 5x5 adjacency matrix for the graph
adj_matrix:
.word 0,1,1,0,0 # node 0's connections
.word 1,0,1,0,0 # node 1's connections
.word 1,1,0,1,1 # node 2's connections
.word 0,0,1,0,0 # node 3's connections
.word 0,0,1,0,0 # node 4's connections
visited: .word 0,0,0,0,0 # visited array
msg1: .string "DFS starting from node: "
arrow: .string " -> "
newline: .string "\n"
.text
.global main
main:
# Print start message
la a0, msg1
li a7, 4
ecall
# Print starting node (1)
li a0, 1
li a7, 1
ecall
la a0, newline
li a7, 4
ecall
# Initialize visited array
la t0, visited
sw zero, 0(t0)
sw zero, 4(t0)
sw zero, 8(t0)
sw zero, 12(t0)
sw zero, 16(t0)
# Start optimized DFS from node 1
li a0, 1
jal optimized_dfs
# Print final newline
la a0, newline
li a7, 4
ecall
# Exit program
li a7, 10
ecall
# Optimized DFS implementation
# a0: current node
optimized_dfs:
# Save return address and s registers
addi sp, sp, -16
sw ra, 12(sp)
sw s0, 8(sp)
sw s1, 4(sp)
sw s2, 0(sp)
# Save current node in s0
mv s0, a0
# Mark as visited and print current node
la t0, visited
slli t1, s0, 2
add t1, t0, t1
li t2, 1
sw t2, 0(t1) # Mark visited
mv a0, s0 # Print node
li a7, 1
ecall
# Initialize neighbor counter
li s1, 0 # Current neighbor
check_neighbors:
li t0, 5
beq s1, t0, dfs_return # If checked all neighbors, return
# Load adjacency info (cached in register)
la t0, adj_matrix
li t1, 20 # 5 nodes * 4 bytes
mul t1, s0, t1 # Row offset
add t0, t0, t1
slli t1, s1, 2 # Column offset
add t0, t0, t1
lw t2, 0(t0) # Load edge value
# If no edge or already visited, skip
beqz t2, next_neighbor
# Check if neighbor is visited (reuse previously loaded visited array address)
la t0, visited
slli t1, s1, 2
add t1, t0, t1
lw t2, 0(t1)
bnez t2, next_neighbor
# Found unvisited neighbor - print arrow and visit
la a0, arrow
li a7, 4
ecall
# Recursively visit neighbor
mv a0, s1
jal optimized_dfs
next_neighbor:
addi s1, s1, 1
j check_neighbors
dfs_return:
# Restore registers and return
lw s2, 0(sp)
lw s1, 4(sp)
lw s0, 8(sp)
lw ra, 12(sp)
addi sp, sp, 16
ret
main
function:
The main
function serves as the entry point of the program and coordinates the execution flow. It starts by printing a message indicating the start of DFS traversal and initializes the visited
array, marking all nodes as unvisited. It then begins the traversal from node 1 by invoking the optimized_dfs
function. Once the traversal is complete, the function prints a newline and exits the program using a system call.
optimized_dfs
function
The optimized_dfs
function is the recursive implementation of depth-first search. It begins by saving the return address and relevant registers onto the stack to preserve the context during recursion. The current node is then marked as visited in the visited
array, and its value is printed. The function iterates through all potential neighbors of the current node by consulting the adjacency matrix. If a neighbor is connected (edge exists) and unvisited, the function prints an arrow symbol to indicate traversal and recursively calls itself to explore the neighbor. Once all neighbors are processed, the function restores the saved register values from the stack and returns to the previous level of recursion.
Neighbor Checking Logic:
The neighbor-checking logic is embedded within the optimized_dfs
function. It calculates the position of each neighbor in the adjacency matrix by determining the row and column offsets for the current node. For each neighbor, it checks if there is a connecting edge and whether the neighbor has already been visited. If both conditions are satisfied, the neighbor is identified as an unvisited connection, and the traversal proceeds to recursively explore it. This logic ensures that all reachable nodes are explored while avoiding revisits to already visited nodes.
Opt DFS
:
# Calculate the base address of the visited array and store it in t0
la t0, visited
slli t1, s0, 2 # Calculate the offset for the current node in visited
add t1, t0, t1 # Compute the access address
li t2, 1
sw t2, 0(t1) # Mark the current node as visited
The base address of visited
(la t0, visited
) is calculated only once and stored in a register. Subsequent operations reuse this base address, reducing redundant calculations.
The offset for the current node (s0
) is calculated once and used directly in subsequent memory access.
DFS
:
# Recalculate the base address of the visited array for every neighbor
la t0, visited
slli t1, s1, 2 # Calculate the offset for the neighbor node in visited
add t0, t0, t1 # Compute the access address
lw t1, 0(t0) # Check if it is already visited
visited
is recalculated for every neighbor, leading to unnecessary computations and additional memory access.Opt DFS
:
# Calculate the base address of the adjacency matrix row
la t0, adj_matrix
li t1, 20 # Each row has 5 nodes, each node takes 4 bytes
mul t1, s0, t1 # Compute the row offset for the current node
add t0, t0, t1 # Compute the row base address (only calculated once)
DFS
:
# Recalculate the row base address for every neighbor
la t0, adj_matrix
li t1, 20 # Each row has 5 nodes, each node takes 4 bytes
mul t1, s0, t1 # Compute the row offset for the current node
add t0, t0, t1 # Compute the row base address
slli t1, s1, 2 # Compute the column offset
add t0, t0, t1 # Compute the final address
lw t2, 0(t0) # Read the adjacency value
Opt DFS
:
visited
array are reused multiple times, exhibiting temporal locality in cache usage.visited
array are predictable and concentrated, resulting in a high cache hit rate.DFS
:
Opt DFS
# Efficiently skip already visited neighbors
la t0, visited
slli t1, s1, 2
add t1, t0, t1
lw t2, 0(t1) # Check if the neighbor is already visited
bnez t2, next_neighbor
DFS
:
# Mixed conditional checks and redundant calculations
la t0, adj_matrix
mul t1, s0, 20 # Calculate row offset
add t0, t0, t1
slli t1, s1, 2 # Calculate column offset
add t0, t0, t1
lw t1, 0(t0) # Check adjacency value
beqz t1, next_neighbor
DFS | Opt DFS | |
---|---|---|
Cycles | 777 | 744 |
Instrs. retired | 579 | 555 |
CPI | 1.34 | 1.34 |
IPC | 0.745 | 0.746 |
Opt DFS
consumes fewer CPU cycles because it eliminates redundant calculations and minimizes memory accesses by efficiently reusing values stored in registers. For example, the base address of the adjacency matrix for a node is computed only once and stored in a register, which is then reused for all neighbor checks. This avoids recalculating the same base address multiple times, as seen in DFS
. Similarly, the base address of the visited
array is calculated once and reused, reducing the overhead associated with repeated address computations.
In DFS
, every neighbor check recalculates the row base address and column offset for the adjacency matrix and re-accesses the visited
array's base address. These repeated operations increase the number of executed instructions, leading to more cycles. Additionally, Opt DFS's
streamlined structure allows for better instruction-level parallelism, as the CPU can execute fewer dependent instructions in sequence.
By reducing redundant operations and leveraging efficient memory and register usage, Opt DFS
lowers the total instruction count and improves execution efficiency, directly reducing the number of CPU cycles required.
Repl.policy | LRU |
Wr. hit | Write-back |
Wr. miss | Write allocate |
DFS | Opt DFS | |
---|---|---|
Hit rate | 0.8667 | 0.8588 |
Hits | 78 | 73 |
Misses | 12 | 12 |
It’s evident that while Opt DFS
is optimized, its cache hit rate is slightly lower than DFS
. This is because Opt DFS's
optimization focuses on significantly reducing the total number of memory accesses, whereas DFS
performs far more frequent memory accesses due to redundant calculations and repeated base address recalculations. These additional memory accesses in DFS
increase its cache hit count and hit rate, but they also introduce unnecessary overhead.
On the other hand, both implementations have the same number of cache misses, which suggests that they both encounter memory blocks that cannot fully benefit from the cache. This is likely due to the adjacency matrix and visited
array involving memory blocks that lack spatial locality, leading to some unavoidable cache misses.
The critical difference lies in the total number of memory accesses. Opt DFS
reduces the number of accesses by reusing computed addresses stored in registers, while DFS
repeatedly calculates and accesses memory for each neighbor. As a result, Opt DFS
achieves fewer overall memory accesses and therefore consumes fewer CPU cycles, making it much more efficient than DFS
.