# User Guide for BigTensor-single
*<div style="text-align: right">
By Data Mining Laboratory,Seoul National University</div>*
## Contents
1. Intro
2. Overview
3. Operations
4. Table
## 1. Intro
* BIGtensor-Sinlge: a general framework for sparse matrix/tensor operations on heterogeneous platforms.
* Version: 1.0.0
* Date:Jan 17th, 2020
* Authors:
* U Kang (ukang@snu.ac.kr)
* Sejoon Oh (ohhenrie@snu.ac.kr)
* Jungwoo Lee (ljw9111@snu.ac.kr)
* Junki Jang (elnino4@snu.ac.kr)
* Bonhun Koo(darkgs@snu.ac.kr)
* Dawon Ahn(dawon@snu.ac.kr)
* Sangjun Son(lucetre@snu.ac.kr)
## 2. Overview

BigTensor-single Library constitute two header files: SparseMatrix.h and Tensor.h.
* Each header files defines data format(e.g., CSR format) and functions(e.g., 생성자, I/O 함수 등).
* Base operations in SparseMatrix.h and Tensor.h were implemented in SparseMatrix.cpp and Tensor.cpp respectively while main operations were implemented in individual source files.
* Kernel codes, necessary to use GPUs with Sparse matrix and tensor operation, saved in Kernel Directory.
* Additionally, test codes for each single operation store in Test code directory.
* Every matrix and tensor in I / O is 1-indexing by default.
## 2. Execution Process
#### Environments
1. g++ compiler
2. c++ standard libraries to include stdio.h, stdlib.h, time.h, math.h, vector, string.h, and algorithm into the project.
3. OpenCL v1.2+ for supporting the GPU acceleration.
4. (Optional) SnuCL library for supporting the distributed-GPU.
#### How to execute
1. BigTensor-single provides a high-performance matrix/tensor operations as c++ library called from individual c++ project.
2. Each project should link the BigTensor library to call the matrix/tensor operations.
3. For each operations, total elapsed time and error rate are printed to stdout.
4. Sample projects are located in 'stc/Test_Code' which can be executed by make commends.
* e.g., 'make spmm', then execute 'bin/demo_spmm'
## 3. Operations
### 3.1 SparseMatrix:
#### SparseMatrix. cpp: data structure for a sparse matrix.
* ##### SparseMatrix(int row, int column, int nonzeros, CSR elements)
* Generate SparseMatrix struct with an input data
* ##### SparseMatrix(char* Path)
* Generate SparseMatrix struct reading data from given path
* ##### void Print(SparseMatrix A, char *Path)
* Print matrix A in COO format to a given path.
* ##### CSR get_data(), int get_metadata(int what)
* Get access to Metadata of SparseMatirx struct.
* ##### Random_Generation(int row, int col, float density)
* Generate a Random sparse matrix with given row, col and density.
### 3.2 Matrix_Operation :
#### Sparse_Matrix_Vector_Multiplication.cpp
* ##### float* SparseMatrix::Matrix_Vector_Multiplication(SparseMatrix A, float* B)
* Return Vector C = A x B with given SparseMatrix A and Vector B
* Assume both Vector B and C dense format and decide the CPU/Single-GPU/Multi-GPU mode accrodding to A's gpu mode.
* Local size affects to the performance, so that set the proper local size proportion to the matrix size.(Default local size is 256)
#### Sparse_Matrix_Matrix_Multiplication.cpp: matrix-vector multiplication
* ##### SparseMatrix SparseMatrix::Matrix_Matrix_Multiplication(SparseMatrix A, SparseMatrix B)
* SparseMatrix A와 B를 인자로 받아 SparseMatrix C = A x B를 반환하는 함수이다. 함수는 A와 B의 곱셈 결과의 nonzero 개수를 예측하는 전처리 부분, row단위의 병렬화로 실제 곱셈을 수행하는 GPU kernel 부분, output을 SparseMatrix 형태로 변환하는 후처리 부분으로 나뉘어진다. 마찬가지로 A의 gpu_mode에 따라 CPU/Single-GPU/Multi-GPU 환경이 결정되며, default local size는 256이다.
#### Sparse_Matrix_Scalar_Operation.cpp
* ##### SparseMatrix Matrix_Scalar_Operation(SparseMatrix A, float c, int op)
* Sparse Matrix에 element-wise로 스칼라 연산을 수행한다.
* *Input*
* Sparse Matrix A, Scalar c, and op (0: addition, 1: subtraction, 2: multiplication, 3: division)
* *Output*
* Sparse Matrix C = c op A
#### Sparse_Matrix_Matrix_Operation.cpp
* ##### SparseMatrix Matrix_Matrix_Operation(SparseMatrix A, SparseMatrix B, int op)
* Sparse Matrix끼리 element-wise로 사칙 연산을 수행한다.
* *Input*
* Sparse Matrix A, B, and op (0: addition, 1: subtraction, 2: multiplication, 3: division)
* *Output*
* Sparse Matrix C = A op B
#### Vector_Scalar_Operation.cpp
* ##### float* Vector_Scalar_Operation(float* A, int len, float c, int op, int gpu_mode)
* Vector에 element-wise로 스칼라 연산을 수행한다.
* *Input*
* Vector A, length of A, Scalar c, op (0: addition, 1: subtraction, 2: multiplication, 3: division)
* gpu_mode (0: CPU, 1: single, 2: multi)
* *Output*
* Vector C = c op A
### 3.2 Tensor.cpp: 텐서 기본 연산 구현체
* ##### Tensor(int ord, int nnz, int* dim, int* ind, float* val)
* 주어진 data로부터 Tensor 구조체를 생성하는 함수
* ##### Tensor(char* Path)
* 주어진 path로부터 텐서 데이터를 읽어 COO형태의 Tensor 구조체를 생성하는 함수
* ##### void Print(Tensor A, char *Path)
* 지정된 path에 주어진 텐서 A를 COO 형태로 출력하는 함수
* ##### int metadata(int what)
* Tensor 구조체에 저장된 정보 (order, nonzero 개수 등)를 access하는 함수
* ##### float FindElement (Tensor input, int * index)
* ##### input 텐서에서 특정 index의 value를 찾아 반환하는 함수
* ##### float Norm(Tensor input, char type)
* Tensor A의 L1 또는 L2 norm을 type에 따라 반환하는 함수
* ##### void GPU_INIT (int gpu_mode, char* kernel_code, char* kernel_name)
* GPU를 사용하기 위해 setup을 하는 함수이다. gpu_mode가 1이면 single-gpu 환경 setting을 하고 2이면 multi-gpu 환경을 위한 setting을 한다. 또한, 사용하고자 하는 kernel code도 인자로 주어, 사용하기전에 미리 build해 놓는다.
* ##### void GPU_DONE (int gpu_mode)
* 인자로 주어진 gpu_mode에 따라, setup한 GPU 객체들을 해제하는 함수이다. 이 함수가 없으면, GPU 연산을 쓸 때마다 객체들이 누적되어 memory overhead가 발생한다.
* ##### GPU_objects GPU_INIT2 (int gpu_mode, char* kernel_code, char* kernel_name)
* GPU를 사용하기 위해 라이브러리 내부에 은닉된 GPU_object 구조체를 초기화하는 함수이다. GPU_INIT 함수와 역할은 비슷하지만, return type이 다르다. GPU_INIT2는 초기화된 GPU_objects 구조체를 반환해줘서, 이 구조체로 여러 번의 연산을 수행 가능한 장점이 있다.
* ##### void GPU_DONE2 (int gpu_mode, GPU_objects obj)
* 인자로 주어진 gpu_mode에 따라, setup된 GPU object를 release해주는 함수이다.
* ##### GPU_Mode_Change (int gpu_mode)
* 해당하는 객체의 GPU 환경을 gpu_mode로 변환하는 함수이다. 객체마다 CPU/Single-GPU/Multi-GPU 등 알맞은 환경이 다른데, 이 때마다 GPU_Mode_Change 함수로 알맞게 gpu_mode를 변경시키면서 실행하면 된다.
### 3.3 Tensor_Operation :
#### Folding.cpp: contains a method converting a matrix into a tensor
* ##### Tensor Tensor::Folding(SparseMatrix A, int order, int nonzeros, int* dimension, int mode)
* Given a CSR-formatted *SparseMatrix* A and conversion options including order, dimensions, # of nonzeros, *Folding* function returns converted *Tensor* object.
* Convert CSR (*Compressed sparse row*) into COO (*Coordinate*) formatted indices iterating every entry of the target matrix A.
* *Input*
* `SparseMatrix A` : a matrix to be converted into a tensor
* `int order` : order of the output tensor
* `int nonzeros` : nonzeros of the output tensor
* `int* dimension` : dimension array of the output tensor, # of elements should be equal to *order*
* `int mode` : mode which was used to be the main basis of tensor matricization
* *Output*
* `Tensor` : the output converted tensor
#### Matricization.cpp: contains a method converting a tensor into a matrix
* ##### SparseMatrix Tensor::Matricization(Tensor A, int mode)
* Given *Tensor* A and a number of mode which is going to be the row basis of a matricized COO-tensor, *Matricization* function performs conversion from a tensor to a CSR-formatted matrix.
* *Input*
* `Tensor A` : a tensor to be matricized
* `int mode` : mode to be the main row while matricization
* *Output*
* `SparseMatrix` : the output converted matrix
#### N_Mode_Product.cpp: contains a method multiplying a tensor to a matrix in the certain mode
* ##### Tensor Tensor::N_mode_product(Tensor A, SparseMatrix B, int mode)
* Given *Tensor A* and *SparseMatrix B* and specified mode, function *N_mode_product* returns the product tensor of A and B.
* Output tensor is calculated as `A ×_n B`.
* CPU/GPU execution mode is determined by the *gpu_mode* of A.
* *Input*
* `Tensor A` : a tensor to be multiplied
* `SparseMatrix B` : a matrix to be a multiplying factor to A
* `int mode` : a number determining the mode of multiplication
* *Output*
* `Tensor` : the tensor product as `A ×_n B`
#### Permutation.cpp: contains a tensor inherent method permuting the order of modes
* ##### void Tensor::permutate_modes(int *modes)
* Given a tensor and mode permutation array, function *Permutation* permutes the indices of all the tensor entries.
* *Input*
* `int modes*` : permutations of natural numbers in range (*1, order*), should not be repeated while # of entries are the same as *order*
* *Output*
* `None`
#### Scaling.cpp: contains a tensor inherent method that scales up one mode
* ##### void Tensor::scaling(int mode)
* Given a tensor _*this_ and a mode number to be scaled up, create a new mode and give the 1th index on added mode to all the existed entries.
* Afterward, the length of newly created mode would be 1.
* *Input*
* `int mode` : a number of place expected to be created as mode, in range (1, *order*+1)
* *Output*
* `None`
#### Sparse_Tensor_Scalar_Operation.cpp: contains a method computing scalar operations
* ##### Tensor Tensor_Scalar_Operation(Tensor A, float c, int op)
* Given a tensor A, a floating number c and the operation mode, function *Tensor_Scalar_Operation* returns the computation result in a shape of *Tensor*.
* *Input*
* `Tensor A` : a tensor instance for a scalar operation
* `float c` : a floating number for a scalar operation
* `int op` : a number indicating an operation mode (0: addition, 1: subtraction, 2: multiplication, 3: division)
* *Output*
* `Tensor` : the output tensor computed `c op A` (in COO format)
#### Sparse_Tensor_Tensor_Operation.cpp
* Tensor Tensor_Tensor_Operation(Tensor A, Tensor B, enum OP op)
* Given a COO-formaatted tensor A and B and the operation mode, function *Tensor_Tensor_Operation* returns a computed tensor.
* Performs four Fundamental rules of arithmetics (+, -, *, /) operations between two tensors
* *Input*
* `Tensor A` : a tensor instance to be computed
* `Tensor B` : a tensor instance to be computed
* `enum OP op` : a number indicating an operation mode (0: OP_ADD, 1: OP_SUBTRACT, 2: OP_MULTIPLY, 3: OP_DIVIDE, 4: OP_MOD, 5: OP_AND, 6: OP_OR)
* *Output*
* `Tensor` : the output tensor computed `A op B` (in COO format)
#### Sparse_Tensor_Update.cpp: contains tensor inherent methods updating a tensor for efficient operations
* ##### void Tensor::remove_zeros()
* Given a tensor _*this_, function *remove_zeros* removes entries whose value is zero and update *nonzeros*.
* *Input*
* `None`
* *Output*
* `None`
* ##### void Tensor::arrange_index()
* Given a tensor _*this_, function *arrange_index* performs sorting the indices of entries comparing dimension-ascendingly.
* *Input*
* `None`
* *Output*
* `None`
* ##### void Tensor::update_dimension()
* Given a tensor _*this_, function *update_dimension* update dimension array by iterating every indices of entries.
* *Input*
* `None`
* *Output*
* `None`
### 3.4 Tensor_Generation
#### Kronecker.cpp
* ##### Tensor Tensor::Kronecker(Tensor K_i, int S, int gpu_mode)
* Generate a Kronecker Tensor
* *Input*
* 초기텐서 (K_i),
* 단계의 수 (S),
* GPU mode:(0: CPU, 1: single, 2: multi)
* *Output*
* Kronecker Tensor
#### R_MAT.cpp
* ##### Tensor Tensor::R_mat(int order, int* dim, int nnz, float* p, int gpu_mode, bool isRandom, int min, int max)
* Generate R-mat Tensor
* *Input*
* order: number of modes
* int * dim :
* int nnz:
* GPU mode:(0: CPU, 1: single, 2: multi)
* bool isRandom : select whether one or random value,
* int min : minimum value to generate
* int max : maximum value to generate
* *Output*
* R-mat Tensor
#### Ones.cpp
* ##### Tensor Tensor::Ones(int order, int* dim, int gpu_mode)
* Generate a sparse tensor whose values are one.
* *Input*
* order: number of modes
* int * dim :
* int nnz:
* GPU mode:(0: CPU, 1: single, 2: multi)
* *Output*
* Sparse tensor filled with one
#### Random.cpp
* ##### Tensor Tensor::Random(int order, int* dim, int nnz, int gpu_mode, int min, int max)
* Generate a sparse tensor whose values are random.
* *Input*
* order: number of modes
* int * dim :
* int nnz:
* GPU mode:(0: CPU, 1: single, 2: multi)
* int min : minimum value to generate
* int max : maximum value to generate
* *Output*
* Sparse random tensor
#### Tensor_From_Factors.cpp
* ##### Tensor Tensor::FromFactors(int order, int rank, int* dim, Tensor CoreT, float* FactorM, int form, int gpu_mode)
* Generate a sparse tensor with a Core Tensor and Factor matrices.
* *Input*
* GPU Mode (0: CPU, 1: single, 2: multi)
* form : Tucker or CP
* *Output*
* Sparse Tensor
### 3.5 Tensor_Factorization :
* Tensor Factorization functions have several features in common
* Decompose a given tensor within CP/Tucker method, and then save the factor matrices and core tensor to a given path.
* Before using the function, a user should set the metadata of input tensor: local_size, gpu_mode (Number of GPUs), partially_observed (Fully or Partially observable), factorize_mode (Tucker or CP)
* After finishing learning, updated Factor matrices and a Core tensor into Tensor.
#### tensor_factorization.cpp: Base Tensor decomposition
* ##### void Tensor::tensor_factorization(Tensor X, Tensor CoreT, int rank, char* Path)
#### tensor_factorization_ntf.cpp: Nonnegative Tensor decomposition
* ##### void Tensor::tensor_factorization(Tensor X, Tensor CoreT, int rank, char* Path)
* Update factor matrices and core tensor nonnegative
#### coupled_tensor_factorization.cpp:
* ##### void Tensor::tensor_factorization(Tensor X, Tensor Y, Tensor CoreT, Tensor CoreT2, int cmode, int rank, char* Path)
* Decompose an input tensor X and an input matrix Y at the same time.
* The function can be decomposed within CP and Tucker method.
* Using CPUs to update coupeld matrix instead of tensor for speed.
* *Input*
* Tensor X
* Tensor Y
* Tensor CoreT
* Tensor CoreT2
* int cmode:
* int rank
* char* Path
* *Output*
* Updated Tensors
### 4. 구현된 함수 표
