User Guide for BigTensor-single

# User Guide for BigTensor-single *<div style="text-align: right"> By Data Mining Laboratory,Seoul National University</div>* ## Contents 1. Intro 2. Overview 3. Operations 4. Table ## 1. Intro * BIGtensor-Sinlge: a general framework for sparse matrix/tensor operations on heterogeneous platforms. * Version: 1.0.0 * Date:Jan 17th, 2020 * Authors: * U Kang (ukang@snu.ac.kr) * Sejoon Oh (ohhenrie@snu.ac.kr) * Jungwoo Lee (ljw9111@snu.ac.kr) * Junki Jang (elnino4@snu.ac.kr) * Bonhun Koo(darkgs@snu.ac.kr) * Dawon Ahn(dawon@snu.ac.kr) * Sangjun Son(lucetre@snu.ac.kr) ## 2. Overview ![](https://i.imgur.com/fET7er5.png) BigTensor-single Library constitute two header files: SparseMatrix.h and Tensor.h. * Each header files defines data format(e.g., CSR format) and functions(e.g., 생성자, I/O 함수 등). * Base operations in SparseMatrix.h and Tensor.h were implemented in SparseMatrix.cpp and Tensor.cpp respectively while main operations were implemented in individual source files. * Kernel codes, necessary to use GPUs with Sparse matrix and tensor operation, saved in Kernel Directory. * Additionally, test codes for each single operation store in Test code directory. * Every matrix and tensor in I / O is 1-indexing by default. ## 2. Execution Process #### Environments 1. g++ compiler 2. c++ standard libraries to include stdio.h, stdlib.h, time.h, math.h, vector, string.h, and algorithm into the project. 3. OpenCL v1.2+ for supporting the GPU acceleration. 4. (Optional) SnuCL library for supporting the distributed-GPU. #### How to execute 1. BigTensor-single provides a high-performance matrix/tensor operations as c++ library called from individual c++ project. 2. Each project should link the BigTensor library to call the matrix/tensor operations. 3. For each operations, total elapsed time and error rate are printed to stdout. 4. Sample projects are located in 'stc/Test_Code' which can be executed by make commends. * e.g., 'make spmm', then execute 'bin/demo_spmm' ## 3. Operations ### 3.1 SparseMatrix: #### SparseMatrix. cpp: data structure for a sparse matrix. * ##### SparseMatrix(int row, int column, int nonzeros, CSR elements) * Generate SparseMatrix struct with an input data * ##### SparseMatrix(char* Path) * Generate SparseMatrix struct reading data from given path * ##### void Print(SparseMatrix A, char *Path) * Print matrix A in COO format to a given path. * ##### CSR get_data(), int get_metadata(int what) * Get access to Metadata of SparseMatirx struct. * ##### Random_Generation(int row, int col, float density) * Generate a Random sparse matrix with given row, col and density. ### 3.2 Matrix_Operation : #### Sparse_Matrix_Vector_Multiplication.cpp * ##### float* SparseMatrix::Matrix_Vector_Multiplication(SparseMatrix A, float* B) * Return Vector C = A x B with given SparseMatrix A and Vector B * Assume both Vector B and C dense format and decide the CPU/Single-GPU/Multi-GPU mode accrodding to A's gpu mode. * Local size affects to the performance, so that set the proper local size proportion to the matrix size.(Default local size is 256) #### Sparse_Matrix_Matrix_Multiplication.cpp: matrix-vector multiplication * ##### SparseMatrix SparseMatrix::Matrix_Matrix_Multiplication(SparseMatrix A, SparseMatrix B) * SparseMatrix A와 B를 인자로 받아 SparseMatrix C = A x B를 반환하는 함수이다. 함수는 A와 B의 곱셈 결과의 nonzero 개수를 예측하는 전처리 부분, row단위의 병렬화로 실제 곱셈을 수행하는 GPU kernel 부분, output을 SparseMatrix 형태로 변환하는 후처리 부분으로 나뉘어진다. 마찬가지로 A의 gpu_mode에 따라 CPU/Single-GPU/Multi-GPU 환경이 결정되며, default local size는 256이다. #### Sparse_Matrix_Scalar_Operation.cpp * ##### SparseMatrix Matrix_Scalar_Operation(SparseMatrix A, float c, int op) * Sparse Matrix에 element-wise로 스칼라 연산을 수행한다. * *Input* * Sparse Matrix A, Scalar c, and op (0: addition, 1: subtraction, 2: multiplication, 3: division) * *Output* * Sparse Matrix C = c op A #### Sparse_Matrix_Matrix_Operation.cpp * ##### SparseMatrix Matrix_Matrix_Operation(SparseMatrix A, SparseMatrix B, int op) * Sparse Matrix끼리 element-wise로 사칙 연산을 수행한다. * *Input* * Sparse Matrix A, B, and op (0: addition, 1: subtraction, 2: multiplication, 3: division) * *Output* * Sparse Matrix C = A op B #### Vector_Scalar_Operation.cpp * ##### float* Vector_Scalar_Operation(float* A, int len, float c, int op, int gpu_mode) * Vector에 element-wise로 스칼라 연산을 수행한다. * *Input* * Vector A, length of A, Scalar c, op (0: addition, 1: subtraction, 2: multiplication, 3: division) * gpu_mode (0: CPU, 1: single, 2: multi) * *Output* * Vector C = c op A ### 3.2 Tensor.cpp: 텐서 기본 연산 구현체 * ##### Tensor(int ord, int nnz, int* dim, int* ind, float* val) * 주어진 data로부터 Tensor 구조체를 생성하는 함수 * ##### Tensor(char* Path) * 주어진 path로부터 텐서 데이터를 읽어 COO형태의 Tensor 구조체를 생성하는 함수 * ##### void Print(Tensor A, char *Path) * 지정된 path에 주어진 텐서 A를 COO 형태로 출력하는 함수 * ##### int metadata(int what) * Tensor 구조체에 저장된 정보 (order, nonzero 개수 등)를 access하는 함수 * ##### float FindElement (Tensor input, int * index) * ##### input 텐서에서 특정 index의 value를 찾아 반환하는 함수 * ##### float Norm(Tensor input, char type) * Tensor A의 L1 또는 L2 norm을 type에 따라 반환하는 함수 * ##### void GPU_INIT (int gpu_mode, char* kernel_code, char* kernel_name) * GPU를 사용하기 위해 setup을 하는 함수이다. gpu_mode가 1이면 single-gpu 환경 setting을 하고 2이면 multi-gpu 환경을 위한 setting을 한다. 또한, 사용하고자 하는 kernel code도 인자로 주어, 사용하기전에 미리 build해 놓는다. * ##### void GPU_DONE (int gpu_mode) * 인자로 주어진 gpu_mode에 따라, setup한 GPU 객체들을 해제하는 함수이다. 이 함수가 없으면, GPU 연산을 쓸 때마다 객체들이 누적되어 memory overhead가 발생한다. * ##### GPU_objects GPU_INIT2 (int gpu_mode, char* kernel_code, char* kernel_name) * GPU를 사용하기 위해 라이브러리 내부에 은닉된 GPU_object 구조체를 초기화하는 함수이다. GPU_INIT 함수와 역할은 비슷하지만, return type이 다르다. GPU_INIT2는 초기화된 GPU_objects 구조체를 반환해줘서, 이 구조체로 여러 번의 연산을 수행 가능한 장점이 있다. * ##### void GPU_DONE2 (int gpu_mode, GPU_objects obj) * 인자로 주어진 gpu_mode에 따라, setup된 GPU object를 release해주는 함수이다. * ##### GPU_Mode_Change (int gpu_mode) * 해당하는 객체의 GPU 환경을 gpu_mode로 변환하는 함수이다. 객체마다 CPU/Single-GPU/Multi-GPU 등 알맞은 환경이 다른데, 이 때마다 GPU_Mode_Change 함수로 알맞게 gpu_mode를 변경시키면서 실행하면 된다. ### 3.3 Tensor_Operation : #### Folding.cpp: contains a method converting a matrix into a tensor * ##### Tensor Tensor::Folding(SparseMatrix A, int order, int nonzeros, int* dimension, int mode) * Given a CSR-formatted *SparseMatrix* A and conversion options including order, dimensions, # of nonzeros, *Folding* function returns converted *Tensor* object. * Convert CSR (*Compressed sparse row*) into COO (*Coordinate*) formatted indices iterating every entry of the target matrix A. * *Input* * `SparseMatrix A` : a matrix to be converted into a tensor * `int order` : order of the output tensor * `int nonzeros` : nonzeros of the output tensor * `int* dimension` : dimension array of the output tensor, # of elements should be equal to *order* * `int mode` : mode which was used to be the main basis of tensor matricization * *Output* * `Tensor` : the output converted tensor #### Matricization.cpp: contains a method converting a tensor into a matrix * ##### SparseMatrix Tensor::Matricization(Tensor A, int mode) * Given *Tensor* A and a number of mode which is going to be the row basis of a matricized COO-tensor, *Matricization* function performs conversion from a tensor to a CSR-formatted matrix. * *Input* * `Tensor A` : a tensor to be matricized * `int mode` : mode to be the main row while matricization * *Output* * `SparseMatrix` : the output converted matrix #### N_Mode_Product.cpp: contains a method multiplying a tensor to a matrix in the certain mode * ##### Tensor Tensor::N_mode_product(Tensor A, SparseMatrix B, int mode) * Given *Tensor A* and *SparseMatrix B* and specified mode, function *N_mode_product* returns the product tensor of A and B. * Output tensor is calculated as `A ×_n B`. * CPU/GPU execution mode is determined by the *gpu_mode* of A. * *Input* * `Tensor A` : a tensor to be multiplied * `SparseMatrix B` : a matrix to be a multiplying factor to A * `int mode` : a number determining the mode of multiplication * *Output* * `Tensor` : the tensor product as `A ×_n B` #### Permutation.cpp: contains a tensor inherent method permuting the order of modes * ##### void Tensor::permutate_modes(int *modes) * Given a tensor and mode permutation array, function *Permutation* permutes the indices of all the tensor entries. * *Input* * `int modes*` : permutations of natural numbers in range (*1, order*), should not be repeated while # of entries are the same as *order* * *Output* * `None` #### Scaling.cpp: contains a tensor inherent method that scales up one mode * ##### void Tensor::scaling(int mode) * Given a tensor _*this_ and a mode number to be scaled up, create a new mode and give the 1th index on added mode to all the existed entries. * Afterward, the length of newly created mode would be 1. * *Input* * `int mode` : a number of place expected to be created as mode, in range (1, *order*+1) * *Output* * `None` #### Sparse_Tensor_Scalar_Operation.cpp: contains a method computing scalar operations * ##### Tensor Tensor_Scalar_Operation(Tensor A, float c, int op) * Given a tensor A, a floating number c and the operation mode, function *Tensor_Scalar_Operation* returns the computation result in a shape of *Tensor*. * *Input* * `Tensor A` : a tensor instance for a scalar operation * `float c` : a floating number for a scalar operation * `int op` : a number indicating an operation mode (0: addition, 1: subtraction, 2: multiplication, 3: division) * *Output* * `Tensor` : the output tensor computed `c op A` (in COO format) #### Sparse_Tensor_Tensor_Operation.cpp * Tensor Tensor_Tensor_Operation(Tensor A, Tensor B, enum OP op) * Given a COO-formaatted tensor A and B and the operation mode, function *Tensor_Tensor_Operation* returns a computed tensor. * Performs four Fundamental rules of arithmetics (+, -, *, /) operations between two tensors * *Input* * `Tensor A` : a tensor instance to be computed * `Tensor B` : a tensor instance to be computed * `enum OP op` : a number indicating an operation mode (0: OP_ADD, 1: OP_SUBTRACT, 2: OP_MULTIPLY, 3: OP_DIVIDE, 4: OP_MOD, 5: OP_AND, 6: OP_OR) * *Output* * `Tensor` : the output tensor computed `A op B` (in COO format) #### Sparse_Tensor_Update.cpp: contains tensor inherent methods updating a tensor for efficient operations * ##### void Tensor::remove_zeros() * Given a tensor _*this_, function *remove_zeros* removes entries whose value is zero and update *nonzeros*. * *Input* * `None` * *Output* * `None` * ##### void Tensor::arrange_index() * Given a tensor _*this_, function *arrange_index* performs sorting the indices of entries comparing dimension-ascendingly. * *Input* * `None` * *Output* * `None` * ##### void Tensor::update_dimension() * Given a tensor _*this_, function *update_dimension* update dimension array by iterating every indices of entries. * *Input* * `None` * *Output* * `None` ### 3.4 Tensor_Generation #### Kronecker.cpp * ##### Tensor Tensor::Kronecker(Tensor K_i, int S, int gpu_mode) * Generate a Kronecker Tensor * *Input* * 초기텐서 (K_i), * 단계의 수 (S), * GPU mode:(0: CPU, 1: single, 2: multi) * *Output* * Kronecker Tensor #### R_MAT.cpp * ##### Tensor Tensor::R_mat(int order, int* dim, int nnz, float* p, int gpu_mode, bool isRandom, int min, int max) * Generate R-mat Tensor * *Input* * order: number of modes * int * dim : * int nnz: * GPU mode:(0: CPU, 1: single, 2: multi) * bool isRandom : select whether one or random value, * int min : minimum value to generate * int max : maximum value to generate * *Output* * R-mat Tensor #### Ones.cpp * ##### Tensor Tensor::Ones(int order, int* dim, int gpu_mode) * Generate a sparse tensor whose values are one. * *Input* * order: number of modes * int * dim : * int nnz: * GPU mode:(0: CPU, 1: single, 2: multi) * *Output* * Sparse tensor filled with one #### Random.cpp * ##### Tensor Tensor::Random(int order, int* dim, int nnz, int gpu_mode, int min, int max) * Generate a sparse tensor whose values are random. * *Input* * order: number of modes * int * dim : * int nnz: * GPU mode:(0: CPU, 1: single, 2: multi) * int min : minimum value to generate * int max : maximum value to generate * *Output* * Sparse random tensor #### Tensor_From_Factors.cpp * ##### Tensor Tensor::FromFactors(int order, int rank, int* dim, Tensor CoreT, float* FactorM, int form, int gpu_mode) * Generate a sparse tensor with a Core Tensor and Factor matrices. * *Input* * GPU Mode (0: CPU, 1: single, 2: multi) * form : Tucker or CP * *Output* * Sparse Tensor ### 3.5 Tensor_Factorization : * Tensor Factorization functions have several features in common * Decompose a given tensor within CP/Tucker method, and then save the factor matrices and core tensor to a given path. * Before using the function, a user should set the metadata of input tensor: local_size, gpu_mode (Number of GPUs), partially_observed (Fully or Partially observable), factorize_mode (Tucker or CP) * After finishing learning, updated Factor matrices and a Core tensor into Tensor. #### tensor_factorization.cpp: Base Tensor decomposition * ##### void Tensor::tensor_factorization(Tensor X, Tensor CoreT, int rank, char* Path) #### tensor_factorization_ntf.cpp: Nonnegative Tensor decomposition * ##### void Tensor::tensor_factorization(Tensor X, Tensor CoreT, int rank, char* Path) * Update factor matrices and core tensor nonnegative #### coupled_tensor_factorization.cpp: * ##### void Tensor::tensor_factorization(Tensor X, Tensor Y, Tensor CoreT, Tensor CoreT2, int cmode, int rank, char* Path) * Decompose an input tensor X and an input matrix Y at the same time. * The function can be decomposed within CP and Tucker method. * Using CPUs to update coupeld matrix instead of tensor for speed. * *Input* * Tensor X * Tensor Y * Tensor CoreT * Tensor CoreT2 * int cmode: * int rank * char* Path * *Output* * Updated Tensors ### 4. 구현된 함수 표 ![](https://i.imgur.com/MUh5zpq.png)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.