自己的 NumPy 自己做!從零開始打造多維陣列 - 張安邦

近年來多維陣列的應用越來越廣泛。從機器學習到資料分析甚至到科學運算都可以看到多維陣列的身影。但是它背後的原理都像是黑魔法一樣;broadcasting、view、transpose、reshape 的機制摸也摸不著。但是它其實沒有那麼難!?40 分鐘帶你走玩一遭可以回家自幹 :3

先備知識

能理解中階C++語法(模板、STL)熟悉陣列使用者

tags: SITCON 2020 共筆 SITCON 2020 2020 共筆 難。 R2

歡迎大家來到SITCON 2020 ヽ(✿゚▽゚)ノ
共筆入口:https://hackmd.io/@SITCON/2020
手機版請點選上方 按鈕展開議程列表。

請從這裡開始

whoami

張安邦/marty1885

phosh

  • Linux 手機的UI
  • GTK + Wayland

起源

Hierarchical Temporal Memory
一個基於於生物學的 AI 模型

Etaler

講者開發的,歡迎打星星
高興能HTM實作
支援 CPU+GPU
Python Binding
社群內三大實作之一
用張量表達資料

tensor
Working on DL support!

Hierarchical Temporal Memory

  • 通常用作序列預測/異常偵測

起源

  • 我需要一個同時支援OpenCL與CPU的張量
  • 但是怎麼找都是 404 或 402
  • 只好鼓起勇氣自己寫惹
  • 架構上接近xtensor和ArrayFile

ROOT

TOC

  1. Vector and matrix
  2. Shape, stride and view
  3. Broadcasting

Vector and matrix

Array

int arr1[4] = {0,1,2,3};
int arr2[4] = {0,1,2,3};
int res[4];
for (int i = 0; i < 4; i++) {
    res[i] = arr1[i] + arr2[i];
}
// 好麻煩
// 還有資安問題
// use template & operation overloading
template <typename T>
struct tensor
{
    tensor() = 
    tensor(size_t size)
        :storage()
        ,size_(size)
        {}
    size_t size()

};

Matrix

  • 一維->多維
    • shape 改成不只兩個 element

Shape, stride and view

Access element from a tensor
How to reshape the tensor

tensor<int> &
{{1,2},
{3,4}}
a.data()[1] (int) 2
  • overload operator "[]"
  • operator[] only allows 1 param
T & operator[](shape coord){
    return somehow_get_value(coord);
}

如何用 2D index 的方式存取?

  • 1D: coord[0]
  • 2D: cood[1] + shape[0] * cood[0]
  • nD: coord[n-1] + \(\sum_{i=0}^{n-2}\)(coord[i] * \(\prod_{j=0}^{i}\)(shape[j]))
    • \(\prod\) 裡面是 stride
T & operator[](){

}

用 array[{1,2}] 來存取元素
Numpy則是: array[1,2]

How views work?

int num[3][4] = {
    {1,2,3,4}
    {5,6,7,8}
    {9,}
}

Now instead of evaulating shape[j] every time

  • we pre-compute

Take 1D tensor<int> for example
shape

Too complex for a talk
Talk to me later
Special views!

Transpose

2 4 -1
-10 5 11
18 -7 6
2 -10 18
4 5 -7
-1 11 6
  • reverse the stride!
    reshape

  • you literally just recompute the stride

  • easy and simple

Broadcasting

  • Numpy's killer feature
  • Mat[\(3*3*3\)] 可和 Mat[\(3*1*3\)]做相加
  • operating

Limitations on shape

  • Dimension either

why?

  • \(3*3\) matrix is a \(1*1*1*1*1*3*3\)

Observation

  • Dimaensions of 1 does not effect strides

Ex:Turning a scalar into matrix

Select a repo