# Assignment II: Valgrind & Pytorch profiler
## 1. Memcheck - [30%]
### 1. Invalid write of size

There is an attempt to write or modify memory outside the bounds of an allocated variable or array.
### 2. Invalid read of size

There is an attempt to access or read memory outside the bounds of an allocated variable or array.
### 3. Conditional jump or move depends on uninitialised value(s)



The code is attempting to make a decision based on the value of a variable that has not been assigned a value yet. As a result, the behavior of the program becomes undefined and can lead to unexpected results or crashes.
### 4. Use of uninitialised value

There is an attempt to read or use the value of a variable that has not been assigned a value before.
### 5. Argument 'size' of function malloc has a fishy (possibly negative) value

The malloc function is used to dynamically allocate memory, and it takes the size of the memory block to be allocated as its argument. The size parameter should be a positive value representing the number of bytes to be allocated.
### 6. Invalid free() / delete / delete[] / realloc()

There is an issue with freeing or reallocating memory using the corresponding functions.
Double freeing: If you attempt to free or delete the same memory block more than once, it will result in an "Invalid free()" error. This can happen if you mistakenly call the free or delete function multiple times on the same pointer.
Mismatched allocation and deallocation: If you allocate memory using one function (e.g., malloc) and then try to deallocate it using a different function (e.g., delete or free), it will lead to an "Invalid free()" error. Make sure to match the allocation and deallocation functions correctly.
Freeing or deleting a non-heap memory: If you try to free or delete a memory block that was not dynamically allocated using functions like malloc or new, it will result in an "Invalid free()" error. Only memory allocated dynamically on the heap can be freed using free() or deleted using delete.
Using an invalid pointer: If you pass an invalid or uninitialized pointer to free(), delete, delete[], or realloc(), it will lead to an "Invalid free()" or "Invalid delete[]" error. Make sure that the pointer points to a valid memory block that was dynamically allocated.
## 2. Cachegrind - [10%]
### Good:

### Bad

There are some possible reasons:
Data access patterns: The difference in LL cache references could be due to different data access patterns between the two cases. For example, one case might exhibit more frequent cache misses or evictions, resulting in additional LL cache references to fetch data from higher levels of the memory hierarchy.
Memory locality: If one case has better memory locality than the other, it can lead to fewer cache misses and, consequently, fewer LL cache references. Memory locality refers to the tendency of a program to access data that is spatially or temporally close to previously accessed data. Optimizing memory access patterns to improve locality can help reduce LL cache references.
Size of working set: The size of the working set, which is the amount of data accessed by a program during its execution, can impact the number of LL cache references. If one case has a larger working set, it may result in more cache evictions and, consequently, more LL cache references.
Differences in code or algorithmic complexity: If the two cases have differences in their code or algorithms, it can affect memory access patterns and LL cache references. One case might involve more complex computations, additional data structures, or a higher number of memory accesses, leading to increased LL cache references.
## 3. Massif - [10%]
### Result


The bytes of heap are malloc in the .c file is 293600B, but the peak of total bytes are used by the heap is 293000B shown at ms_print above.
## 4. Callgrind - [20%]
[10%] By kcachegrind GUI, please find out the bottleneck function first. Screenshot the call graph of this function that contains its lower layer, then indicate which function is the most expensive and which function is called the most times in the call graph.
Use self to sort, the largest is verify_bfs_tree, so it is the bottleneck function, and it is also the most expensive one. The Call Graph is as follows:

In the figure, compute_levels is called the most times by the bottleneck function, and there are 64 times in total.
[10%] Point out which function has been called the most times in whole program, and who is its caller ? (you can directly screenshot the call graph or answer directly)
Use called to sort, mod_mac has the most times, and the Call Graph is as follows:

## 5. Pytorch profiler - [30%]
[20%] Please, take a screenshot of the analysis result(Must contain username and machinename as shown in the first line of picture above), and find out the top three in the self cpu columns, and point out what operations those are (Not only the name of the row, but also what operation it really is)

The top three in the self cpu columns are aten::unsqueeze, aten::as_stride, and aten::ne.
aten::unsqueeze: This operation adds dimensions to a tensor. It takes a tensor and a dimension index as inputs and returns a new tensor with one or more dimensions inserted at the specified position(s). The unsqueeze operation is useful when you want to increase the dimensionality of a tensor to match the expected shape for certain computations or to broadcast tensors of different shapes during operations.
aten::as_stride: This operation creates a view of a tensor with a different stride configuration. It takes a tensor and a list of stride values as inputs and returns a new tensor that shares the same data but has a different stride pattern. Stride refers to the number of elements to skip in each dimension when accessing the tensor's underlying data. The as_stride operation allows you to change the memory layout of a tensor without actually copying the data, which can be beneficial for certain memory optimizations and reshaping operations.
aten::ne: This operation performs an element-wise comparison of two tensors to check for inequality. It takes two tensors of the same shape and returns a new tensor of the same shape with boolean values indicating whether the corresponding elements are not equal. The ne operation is commonly used in conditional expressions, masking, or boolean indexing to identify elements that do not match a given criterion or to create boolean masks for selective computations.
[10%] Output .json file and analyze in Chrome trace viewer. Take a screenshot of your entire chrome screen and point out what actions the two colors that appear the most are, except for model label(e.g. model_inference in tutorials).

The two colors that appear the most are aten::linear and aten::addmm.
aten::linear: This operation performs a linear transformation on the input tensor. It takes three inputs: the input tensor, the weight tensor, and the bias tensor. The aten::linear operation computes the matrix multiplication between the input tensor and the weight tensor, and then adds the bias tensor. In other words, it applies a linear transformation to the input tensor using the provided weight and bias parameters. This operation is often used to implement the linear layer in a neural network.
aten::addmm: This operation performs a matrix multiplication and addition. It takes three inputs: the first input tensor, the second input tensor, and a third tensor for bias. The aten::addmm operation computes the matrix multiplication between the first and second input tensors and then adds the third bias tensor. It is equivalent to performing torch.mm(input, mat1) + mat2. This operation is commonly used for calculations involving linear transformations and can be found in various neural network layers and computations.