# Memory and Storage Systems
###### tags: `Master`
## Assignment II: Valgrind & Pytorch profiler
## 310611087 王興勝
### Q1. Memcheck
**Question**: There are 6 errors in total [5% per error], please find out the errors from the log and add annotation about what the error is, as shown in the picture below.(one of the answers)Additionally, please take a screenshot and briefly explain what caused the error or what code will cause this error in the report.
**1. Invalid write (Memory write out of bounds)**

When the code attempts to modify memory that it does not have permission to access, which can lead to undefined behavior and program crashes. For example, writing beyond the bounds of an allocated array.
```cpp!
#include<stdio.h>
#include<iostream>
int main()
{
int *x;
x = static_cast<int *>(malloc(8 * sizeof(int)));
x[9] = 0;
free(x);
return 0;
}
```
**2. Invalid read (Memory read out of bounds)**

When the code is trying to access memory that it does not have permission to read from, leading to undefined behavior and potential crashes. For example, reading beyond the bounds of an allocated array.
```cpp!
#include<stdio.h>
#include<iostream>
int main()
{
int *x;
x = static_cast<int *>(malloc(8 * sizeof(int)));
std::cout << x[9];
free(x);
return 0;
}
```
**3. Use uninitialized memory**

When the code includes conditional statements (such as if-else or switch) that depend on the value of an uninitialized variable.
```cpp!
#include<stdio.h>
int main()
{
int x;
if(x == 0)
{
printf("X is zero")
}
return 0;
}
```
**4. Use of uninitialized value**

When the code attempts to read or manipulate the value of an uninitialized variable directly, without initializing it first. For example, declaring a variable but do not assign a value to it before using it, the variable will contain garbage data.
```cpp!
#include<stdlib.h>
void f()
{
int *p;
p[0] = 1;
}
int main()
{
int *array = malloc(10*sizeof(int));
f();
return 0;
}
```
**5. Fishy argument values**

All memory allocation functions take an argument specifying the size of the memory block that should be allocated. Clearly, the requested size should be a non-negative value and is typically not excessively large.
**6. Illegal frees (Double frees)**

When the program attempts to free memory that was not allocated or has already been freed.
```cpp!
#include<stdio.h>
#include<iostream>
int main()
{
int *x;
x = static_cast<int *>(malloc(8 * sizeof(int)));
x = static_cast<int *>(malloc(8 * sizeof(int)));
free(x);
free(x);
return 0;
}
```
### Q2. Cachegrind
**Question**: Please take screenshots of two logs and point out where the difference is and explain why this problem occurs in the report.
**1. good_log**

**2. bad_log**

The difference is D1 misses, and it means that the cache misses that occur in the level 1 data cache. Besides, when the CPU needs to fetch data, it checks the caches in a hierarchical order. The number of D1 misses will impact the number of LL refs (Last-Level Cache references).
There are several factors that can cause cache misses, such as cold cache misses, cache capacity misses, cache thrashing.
### Q3. Massif
**Question**: In your report, please take a screenshot of the output graph (include the following Number of snapshots and Detailed snapshots array). At peak, point out how many bytes of heap are malloc in the .c file, and how many total bytes are used by the heap.

At peak, **3344 KB** of heap are malloc in the heap.c and **4867 KB** are used by the heap.
### Q4. Callgrind
**Question1** [10%]: By kcachegrind GUI, please find out the bottleneck function first. Screenshot the call graph of this function that contains its lower layer, then indicate which function is the most expensive and which function is called the most times in the call graph.
```
The bottleneck function is run_bfs().
```
```
verify_bfs_tree() is the most expensive.
```
```
compute_levels() is called the most times.
```

**Question2** [10%]: Point out which function has been called the most times in whole program, and who is its caller ? (you can directly screenshot the call graph or answer directly)
```
mod_mac() has been called the most times and mod_mac_y() is the caller
```

### Q5. Pytorch profiler
**Question1**: Please, take a screenshot of the analysis result(Must contain username and machinename as shown in the first line of picture above), and find out the top three in the self cpu columns, and point out what operations those are (Not only the name of the row, but also what operation it really is.)

1. **model_inference**: Model inference refers to the process of using a trained machine learning model to make predictions or generate outputs based on input data. It is the step where the model takes in the input and produces the desired output, such as loading the model, preprocessing the input, calculating predictions.
2. **aten::addmm**: An operation in PyTorch that performs matrix multiplication and addition, such as matrix multiplication, addition, output.
3. **cudaLaunchKernel**: The parallel computing platform and API provided by NVIDIA, used for launching a CUDA kernel on the GPU. A CUDA kernel is a function that runs in parallel on multiple threads of the GPU, allowing for efficient execution of computations.
**Question2**: Output .json file and analyze in Chrome trace viewer. Take a screenshot of your entire chrome screen and point out what actions the two colors that appear the most are, except for model label(e.g. model_inference in tutorials).
1. **aten::linear** : a function that performs a linear transformation on an input tensor, such as matrix multiplication, bias addition, output.
2. **aten::einsum** : a function that performs tensor contraction using Einstein summation notation, such as equation string, input tensors, contraction.
