[Exp] Understanding pytorch Tensor librar(ies)

# [Exp] Understanding pytorch Tensor librar(ies) ###### tags: `research-DLRM` [TOC] ## Tensor library is the core of ML frameworks Modern ML framework come with fancy and convenient python APIs and versatile python library that helps programmers manipulat data. However, under the hood of every ML framework, it is the **tensor library** that does all the heavy weight lifting. In the following section, I will briefly go through the underlying tensor libraries in ML frameworks. ## Pytorch Pytorch is supported by **ATen** and **C10** tensor libraries. ### A Tensor Library (ATen library) ATen library is merely a wrapper. #### [Tensor](https://github.com/WeiCheng14159/pytorch/blob/master/aten/src/ATen/templates/TensorBody.h) class #### [TensorBase]() class - int64_t dim() const - bool is_floating_point() const - bool is_complex() const - int64_t size(int64_t dim) const - int64_t stride(int64_t dim) const - size_t nbytes() const - int64_t numel() const - ScalarType scalar_type() const - Device device() const ### Caffe Tensor Library (C10 library) #### [TensorImpl](https://github.com/WeiCheng14159/pytorch/blob/master/c10/core/TensorImpl.h) class It contains a pointer to a storage struct which contains the pointer to the actual data (c10::Storage/StorageImpl) and records the data type and device of the view. This allows multiple tensors to alias the same underlying data. The tensor struct itself records view-specific metadata about the tensor, e.g., sizes, strides and offset into storage. Each view of a storage can have a different size or offset. - IntArrayRef sizes() const - IntArrayRef strides() const - int64_t size (int64_t d) const - int64_t stride(int64_t d) const - int64_t dim() const - int64_t numel() const - bool is_contiguous() - **c10::Storage storage_;** #### [Storage](https://github.com/WeiCheng14159/pytorch/blob/master/c10/core/Storage.h) class - void * data() - size_t nbytes() - DeviceType device_type() - **c10::intrusive_ptr\<StorageImpl\> storage_impl_;** #### [StorageImpl](https://github.com/WeiCheng14159/pytorch/blob/master/c10/core/StorageImpl.h) class - void * data() - size_t nbytes() - Device device() - **DataPtr data_ptr_;** #### [DataPtr](https://github.com/WeiCheng14159/pytorch/blob/master/c10/core/Allocator.h) class - void * get() - **c10::detail::UniqueVoidPtr ptr_;** #### [UniqueVoidPtr](https://github.com/WeiCheng14159/pytorch/blob/master/c10/util/UniqueVoidPtr.h) class - Why not std::unique_ptr ? :::info A detail::UniqueVoidPtr is an owning smart pointer like unique_ptr, but with three major differences: 1) It is specialized to void 2) It is specialized for a function pointer deleter void(void* ctx); i.e., the deleter doesn't take a reference to the data, just to a context pointer (erased as void*). In fact, internally, this pointer is implemented as having an owning reference to context, and a non-owning reference to data; this is why you release_context(), not release() (the conventional API for release() wouldn't give you enough information to properly dispose of the object later.) 3) The deleter is guaranteed to be called when the unique pointer is destructed and the context is non-null; this is different from std::unique_ptr where the deleter is not called if the data pointer is null. ::: - Why UniqueVoidPtr is the solution ? :::info UniqueVoidPtr solves a common problem for allocators of tensor data, which is that the data pointer (e.g., float*) which you are interested in, is not the same as the context pointer (e.g., DLManagedTensor) which you need to actually deallocate the data. Under a conventional deleter design, you have to store extra context in the deleter itself so that you can actually delete the right thing. Implementing this with standard C++ is somewhat error-prone: if you use a std::unique_ptr to manage tensors, the deleter will not be called if the data pointer is nullptr, which can cause a leak if the context pointer is non-null (and the deleter is responsible for freeing both the data pointer and the context pointer). So, in our reimplementation of unique_ptr, which just store the context directly in the unique pointer, and attach the deleter to the context pointer itself. In simple cases, the context pointer is just the pointer itself. ::: ### How to print pytorch Tensor in gdb ? ```p *((TensorType *)Tensor.impl_->storage_.storage_impl_->data_ptr_.get())``` or ```p *((TensorType *)Tensor.data_ptr())``` ## TensorFlow TBD