2024q1 Homework6 (integration)

> # 2024q1 Homework6 (integration) contributed by < `SimonLee0316` > ## 實驗環境 ```shell $ gcc --version gcc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz CPU family: 6 Model: 165 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 5 CPU max MHz: 4800.0000 CPU min MHz: 800.0000 BogoMIPS: 5799.77 Caches (sum of all): L1d: 256 KiB (8 instances) L1i: 256 KiB (8 instances) L2: 2 MiB (8 instances) L3: 16 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-15 ``` ## 自我檢查清單 - [x] 研讀前述 Linux 效能分析描述，在自己的實體電腦運作 GNU/Linux，做好必要的設定和準備工作。 - [x] 閱讀[〈Linux 核心模組運作原理〉](https://hackmd.io/@sysprog/linux-kernel-module#Linux-%E6%A0%B8%E5%BF%83%E6%A8%A1%E7%B5%84%E9%81%8B%E4%BD%9C%E5%8E%9F%E7%90%86)並對照 Linux 核心原始程式碼 (v6.1+)，解釋 `insmod` 後，Linux 核心模組的符號 (symbol) 如何被 Linux 核心找到 (使用 List API)、`MODULE_LICENSE` 巨集指定的授權條款又對核心有什麼影響 (GPL 與否對於可用的符號列表有關)，以及藉由 strace 追蹤 Linux 核心的掛載，涉及哪些些系統呼叫和子系統？編譯完核心模組後，使用`insmod fibdrv.ko`，將核心模組掛載到 linux 核心，但因為 `fibdrv.ko` 是 user space 的程序，透過 `insmod` 使它掛載到核心，裡面的 `kernal module` 資料要複製到 kernal sapce 中才能執行。在將 userspace 掛載到核心的過程中，可以透過 strace 看到，有將 `fibdrv.ko` 開啟並得到 file descriptor 並將其傳入 `finit_module` ，且透過` mmap` 將 `fibdrv.ko` 映射到一段記憶體裡面，在 `finit_module` 中使用 `kernel_read_file_from_fd()`將`fibdrv.ko`資料載入到核心記憶體中，呼叫 `load_module` 將為模組配置記憶體以及初始化核心模組。 :::danger 詳細描述流程和方法。 ::: 使用 <s>trace 命令</s> strace 工具觀察 `insmod` 過程中，發現會使用 `finit_module`，去做 load module，以及大量的系統呼叫如 `close(),mmap(),read(),openat(),newfstatat()`。透過追蹤程式碼可以發現， load module 中的 `simplify_symbols() -> resolve_symbol_wait() -> resolve_symbol() -> find_symbol()`內有使用 List API (list_for_each_entry_rcu())。 `MODULE_LICENSE` 巨集指定的授權條款可以確保模組合法性，而在在這些 license idents 裡面<s>免費</s> 可以自由使用的有 `GPL, GPL v2, GPL and additional rights, Dual BSD/GPL, Dual MIT/GPL, Dual MPL/GPL` ，以及需要<s>付費</s> 不能自由使用的 `Proprietary`。 :::warning 當我們提到自由軟體 (free software) 時，我們是指自由 (freedom)，而非免費。 GPL 授權許可是設計來確保你有散播自由軟體的自由 (你若願意，你可以此服務收費)，確保你可取得原始碼，確保你可以修改該軟體並使用它為基礎創造新的自由軟體；以及確保你認知到你擁有這些權力 -[從 Revolution OS 看作業系統生態變化](https://hackmd.io/@sysprog/revolution-os-note)。 ::: :::danger 區隔方式並非「免費」和「付費」，請詳閱授權條款，搭配第三週教材 [從 Revolution OS 看作業系統生態變化](https://hackmd.io/@sysprog/revolution-os-note) ::: - [x] 閱讀《The Linux Kernel Module Programming Guide》(LKMPG) 並解釋 simrupt 程式碼裡頭的 mutex lock 的使用方式，並探討能否改寫為 lock-free; 在 simrupt 使用到 mutex lock 的地方在 `simrupt_read()`，這邊嘗試獲取 lock使用到 `mutex_lock_interruptible()`，與一般的 mute lock 不同，一般的 mutex lock 再沒有取得 lock 之前，會被停住直到有人釋放 lock，而 `mutex_lock_interruptible()`，在註解的地方解釋如果 process 在睡眠狀態且收到中斷的話會回傳 `%-EINTR`且不在嘗試獲取 lock，釋放 lock 使用 `mutex_unlock`。在迴圈裡面還有一個函式 `wait_event_interruptible()` ，其定義在 [/include/linux/wait.h](https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L495)，它會進入睡眠直到條件變數為真，在任何可能更改條件變數之後必須使用 `wake_up()`，而在這邊使用的是 `wake_up_interruptible`，可以達到一樣的效果。 - [ ] 探討 Timsort, Pattern Defeating Quicksort (pdqsort) 及 Linux 核心 lib/sort.c 在排序過程中的平均比較次數，並提供對應的數學證明; - [x] 研讀 CMWQ (Concurrency Managed Workqueue) 文件，對照 simrupt 專案的執行表現，留意到 worker-pools 類型可指定 "Bound" 來分配及限制特定 worker 執行於指定的 CPU，Linux 核心如何做到？CMWQ 關聯的 worker thread 又如何與 CPU 排程器互動？專案內配置 workqueue 使用的 flag 為 `WQ_UNBOUND`，根據 [CMWQ](https://www.kernel.org/doc/html/latest/core-api/workqueue.html) flag 的解釋，未綁定的 workqueue，會由一個特殊的 worker-pool 處理，這些 worker 不榜定到任何的 cpu，可以在任何 cpu 上執行。如果在搭配 api `queue_work_on()` ，可以將 workqueue 榜定在特定的 cpu上面執行。我認為 worker thread 有點像是 thread pool 的概念，排程器會根據 worker thread 目前的數量去分配工作。 - [ ] 解釋 xoroshiro128+ 的原理 (對照〈Scrambled Linear Pseudorandom Number Generators〉論文)，並利用 ksort 提供的 xoro 核心模組，比較 Linux 核心內建的 /dev/random 及 /dev/urandom 的速度，說明 xoroshiro128+ 是否有速度的優勢？其弱點又是什麼？ ## ksort: 處理並行排序的 Linux 核心模組觀察使用 `make check` 來理解如何掛載核心模組。 ```makefile check: all $(MAKE) insmod sudo ./user sudo ./test_xoro $(MAKE) rmmod ``` check 有四個 target ，以下一一分析。 1. 將模組掛載入核心。 * 將 `user.c , test_xoro.c` 編譯成 object file。 ```makefile all: $(GIT_HOOKS) user test_xoro $(MAKE) -C $(KDIR) M=$(PWD) modules ``` * 將 target module 編譯成 kernal object。 * module 名稱為 `sort.o, xoro.o` ，每個object 依賴於自己 `sort-objs`。 ```makefile obj-m += sort.o sort-objs := \ sort_mod.o \ sort_impl.o obj-m += xoro.o xoro-objs := \ xoro_mod.o ``` 2. 執行 `./user` 透過 Linux Virtual File System 介面，本核心模組可排序使用者層級指定的資料，並讓 user.c 程式得以存取。觀察 user.c 看到先打開了 `/dev/sort`，且隨機產生 1000 個數字加入指標 `inbuf`，呼叫 `read()`，透過 Linux Virtual File System 介面，核心模組可排序使用者層級指定的資料，並讓 client.c 程式得以存取。 ```c static const struct file_operations fops = { .read = sort_read, .write = sort_write, .open = sort_open, .release = sort_release, .owner = THIS_MODULE, }; ``` [/include/linux/fs.h](https://elixir.bootlin.com/linux/latest/source/include/linux/fs.h#L1983) 裡面有對於這個資料結構的用法，而這個程式將對檔案的操作做了改寫並重新定義在 `sort_mod.c`，在呼叫 `read(fd, inbuf, size);` 函式的時候，將 `inbuf` 的資料複製到 `sort_buffer` 裡面經過排序後的資料在複製回去 `inbuf` ，並回傳資料長度。 3. 執行 `./test_xoro`。 4. 將模組從核心卸載。 ## 作業主題一: 並行的混合排序 :::success 換個寫法將這演算法定義成 enum ，本方法參考 [dockyu](https://github.com/dockyu/ksort) 同學。 ::: ```c typedef enum { QSORT, TIMSORT, PDQSORT, LINUX_SORT } sort_method_t; ``` 透過 `write()` 函式將排序演算法的種類透過 buff 傳入並將值賦予核心模組內的全域變數 sort_method 用來表示現在要用那一種排序演算法。 ```c static ssize_t sort_write(struct file *file, const char *buf, size_t size, loff_t *offset) { int method; if (copy_from_user(&method, buf, sizeof(int))) { return -EFAULT; } sort_method = (sort_method_t) method; printk(KERN_INFO "Set sort method: %s\n", get_sort_method_name(sort_method)); return sizeof(method); } ``` 隨後呼叫 `read()` ，進行排序，在呼叫 `sort_main()` 時將 typeOfsort 一起傳入，判斷是那一種演算法並且去執行。 ```c void sort_main(void *sort_buffer, size_t size, size_t es,sort_method_t sort_method) { switch (sort_method) { case TIMSORT: printk(KERN_INFO "Do TIMSORT\n"); /*implement algorithm*/ break; case LINUX_SORT: printk(KERN_INFO "Do LINUX_SORT\n"); /*implement algorithm*/ break; case QSORT: printk(KERN_INFO "Do QSORT\n"); /*implement algorithm*/ break; case PDQSORT: printk(KERN_INFO "Do PDQSORT\n"); /*implement algorithm*/ break; } } ``` ### 整合第一週測驗提到的 Timsort 資料結構如下所示 ```graphviz digraph Timsort { rankdir=LR; node [shape=record]; subgraph cluster_a { label = Timosrt subgraph cluster_b{ label = work_struct; struct0 [label="fun | <entry>entry"]; } struct1 [label="n"]; struct2 [label="tim_cmp"]; struct3 [label="sample_head |{<pre>pre|<next>next}"]; } subgraph cluster_d { node [shape=record]; value [label="{value}"]; head_e [label="head|{<prev>prev|<next>next}"]; label=element_t; } struct3:next->head_e:head; head_e:prev->struct3:sample_head; head_e:next->struct3:sample_head; struct3:pre->head_e:head; struct3:next->head_e:head; } ``` Timsort 的實做使用 list.h 所提供的 api ，所以要把資料的形式轉換成 Timsort 所接受的形式。以下為將 buffer 的資料轉換和做完排序之後要將資料存回 buffer。 ```c static void copy_buf_to_list(struct list_head *head, int *a, size_t len) { for (size_t i = 0; i < len; i++) { element_t *new = kmalloc(sizeof(element_t), GFP_KERNEL); new->val = a[i]; list_add_tail(&new->list, head); } } static void copy_list_to_buf(int *a, struct list_head *head, size_t len) { element_t *cur; int i = 0; list_for_each_entry (cur, head, list) { a[i] = cur->val; i++; } } ``` 將 `Timsort.[ch]` 加入專案，再做一點修改，變數 priv 是用來計算交換次數的但在我們這個專案有點累綴，但為了保持測驗1 Timsort 的原樣我選擇保留。 `Timsort.c` ```c void timsort(struct work_struct *w) { printk(KERN_INFO "start timsort \n"); struct timsort_for_work *ts = container_of(w, struct timsort_for_work, w); struct list_head *head = &ts->sample_head; void *priv = 0; list_cmp_func_t cmp = ts->tim_cmp; /*...*/ } ``` 結果 : ``` sudo ./user Soring succeeded! ``` commit : [14a4ce6](https://github.com/sysprog21/ksort/commit/14a4ce62ffc76651e1f10fecea774f6b1ac07ac4) ### 整合 Linux 核心 lib/sort.c 宣告結構 ```c struct linuxsort { struct work_struct w; struct commonforlinux *common; void *a; size_t n; }; struct commonforlinux { int swaptype; /* Code to use for swapping */ size_t es; /* Element size. */ cmp_func_t linux_cmp; /* Comparison function */ }; ``` 將排序演算法加入 workqueue，在執行。 ```c INIT_WORK(&ls->w, linuxsort_algo); queue_work(workqueue, &ls->w); ``` 使用 `linux/sort.h` 內之 sort。 ```c static void linuxsort_algo(struct work_struct *w) { struct linuxsort *ls = container_of(w, struct linuxsort, w); void *base; /* Array of elements. */ size_t num, size; /* Number of elements; size. */ cmp_func_t cmp_func; struct commonforlinux *c; c = ls->common; base = ls->a; num = ls->n; size = c->es; cmp_func = c->linux_cmp; sort(base, num, size, cmp_func, NULL); } ``` commit : [c951207](https://github.com/sysprog21/ksort/commit/c951207a09c59a218bd09bbd866b87e7e3e13a34) ### 閱讀 Pattern Defeating Quicksort (pdqsort) 這個排序演算法結合了 `insertion sort, quick sort, heap sort` 來確保排序演算法 worst-case 為 $O(nk)$。使用 `insertion sort` 來節省函式遞迴呼叫的成本，降低記憶體使用量。使用 `heap sort` 確保時間複雜度在 $O(nlogn)$。先探討為何 `quick sort` 的 worst-case 是 $O(n^2)$，因為 pivot 的選擇是在最終結果的最左邊或最右邊(亦即選出來的 pivot 太大或太小)，而讓這個遞迴深度不平衡，所以這邊作者是用 **median of 3** ，來挑選 pivot ，然而這個法也有可能讓 partition 的結果不平衡，所以額外使用了 **bad partition**來當作判斷，我們用 $p$ 來當作 partition 的分割狀況，完美的 partition 是 $p=0.5$ ，代表選擇的 pivot 剛好可以將資料分成兩半，當偏離 0.5 越多代表這次的分段品質越差，那如何定義 **bad** 呢？使用資訊熵將角度來判斷 $$ \lim_{n \to \infty}\frac{T(n,p)}{T(n,\frac{2}{1})}=\frac{1}{H(p)}， H(p)=-plog_2(p)-(1-p)log_2(1-p) $$ 論文使用 $p=0.125$ 來作為 bad partition 的值，如果小於這個值的話則視為 bad partition ，每當遇到 bad aprtition ，limit 值(初始為 $logn$) 就會減少，當值為 0 時，則換用 isertion sort 或者 heap sort ，來確保不會落入遞迴深度過身的情況。解決完 pivot 的選擇之後接下來還要解決一個會造成 worst-case 的問題，也就是 **重複計算相同 pivot**，重複計算相同 pivot 沒辦法使演算法有實際的進展，所以作者提出新的 partition 方法，提出了兩個函式，分別為 `partition left, partition right` partition left : 將小於等於 **pivot** 的元素放到 pivot 的左側。 partition right : 將大於等於 **pivot** 的元素放到 pivot 的右側。分段規則： 1. median of 3 找到一個 pivot **p** 。 2. partion right 。 3. median of 3 找到右側的新 pivot **q**。 4. 如果 $p\neq q\ 的\ predecessor$，則遞迴右邊直到 $p = q\ 的\ predecessor$。如果 $p = q\ 的\ predecessor$，則做 partition left。取作者(Orson R. L. Peters)在 Youtube 上的[演說範例](https://www.youtube.com/watch?v=jz-PBiWwNjc&t=1s&ab_channel=CWIDatabaseArchitectures)，可以更好的理解： ![image](https://hackmd.io/_uploads/BJcWPza-0.png) 在 insertion sort 中，使用了 partial insertion sort 來加快效能，partial 的意思是，我們會追蹤這次排序交換了多少次元素，如果超過某個閥值 (論文取 8)，則直接結束插入排序的動作。若低於閥值的話，代表我們已經排序好左右子序列，不用繼續往下遞迴。論文中也提到許多能夠在加速計算的方法，如 swapless,branch predictions eliminated。 ### 整合 Pattern Defeating Quicksort (pdqsort) ### 從使用者層級的程式對裝置檔案進行設定 (如 write 系統呼叫)，讓這些排序實作存在於同一個 Linux 核心模組，並得以切換和比較透過 write 傳遞參數，改便目前要執行的排序演算法。 `user.c` ```c sort_method_t method = LINUXSORT; if (write(fd, &method, sizeof(method)) != sizeof(method)) { perror("Failed to set sort method"); close(fd); goto error; } ``` 在 `user.c` 內，每次執行完一種完算法透過 write 系統呼叫改變演算法類型，在透過 read 系統呼叫將測試資料傳入並執行排序演算法。 ### 在使用者和 linux 核心空間量化執行時間先設計如何測試，資料個數從 1000 到 20000 ，每次增加 500 筆。每次做完都將資料寫入 csv 檔案，在透過 `make plot` 使用 gnuplot 將資料可視化。 :::info 這裡取 20000 是因為再大的話就會遇到以下錯誤，需要重新開機。 ```shell [ 156.659689] general protection fault, probably for non-canonical address 0x761b1616a86d165c: 0000 [#1] PREEMPT SMP NOPTI ``` ::: * 使用空間 ```c int main() { FILE *file = fopen("output.csv", "w"); sorttime result; size_t start = 1000, end = 20000; size_t step = 500; for (size_t k = start; k <= end; k += step) { result = time_analysis(k); fprintf(file, "%zu,%llu,%llu,%llu\n", k, result.qsort, result.timsort, result.linuxsort); } fclose(file); return 0; } ``` 定義儲存每次各個演算法執行時間結構。 ```c typedef struct { unsigned long long qsort_user; unsigned long long timsort_user; unsigned long long linuxsort_user; unsigned long long qsort_kernal; unsigned long long timsort_kernal; unsigned long long linuxsort_kernal; } sorttime; ``` 實做 `time_analysis()`。 ```c sorttime time_analysis(size_t num) { /*...*/ clock_gettime(CLOCK_MONOTONIC, &start); ssize_t r_sz = read(fd, inbuf, size); clock_gettime(CLOCK_MONOTONIC, &end); /*...*/ } { /*Linux sort test*/ /*...*/ time.linuxsort = (end.tv_sec - start.tv_sec) * 1000000000LL + (end.tv_nsec - start.tv_nsec); /*...*/ } { /*qsort test*/ /*...*/ time.qsort = (end.tv_sec - start.tv_sec) * 1000000000LL + (end.tv_nsec - start.tv_nsec); /*...*/ } error: free(inbuf); if (fd > 0) close(fd); return time; } ``` * linux 核心空間我們在使用者空間需要透過 virtual file system 的檔案操作取得排序演算法模組在核心的執行時間，在 [/include/linux/fs.h](https://elixir.bootlin.com/linux/v6.8.9/source/include/linux/fs.h#L1983) 中找到回傳值為 long 的檔案操作 `unlocked_ioctl` ，這個很適合用來作為回傳執行時間的操作，所以在 file operation 新建了一個操作如下： ```c static const struct file_operations fops = { /*...*/ .unlocked_ioctl = sort_time, /*...*/ }; ``` 並在使用 `sort_read()` 呼叫 `sort_main()` 時回傳執行時間，並賦值給全域變數 `ktime_t kt` ，在使用者空間使用 VFS 操作 `ioctl(fd, 0, 0)` 取得執行時間。在使用 `queue_work()` API 的前後加入量測時間的 API，並將 `sort_main` 函式的回傳值設為 `ktime_t`。 ```c ktime_t sort_main(void *sort_buffer, size_t size, size_t es, sort_method_t sort_method) { /*...*/ kt = ktime_get(); /*sorting time*/ queue_work_on(cpu_id, workqueue, &ls->w); // queue_work(workqueue, &ls->w); if dont want task work on Specify cpu drain_workqueue(workqueue); kt = ktime_sub(ktime_get(), kt); /*...*/ return kt; } ``` * 為了能夠使執行時更準確，我們需要將我們要執行模組的 CPU 隔開只執行我們要的工作，我要指定 CPU 15 為執行模組的CPU 未指定 CPU。 ![image](https://hackmd.io/_uploads/HyeehXUGA.png) 限定 CPU 15 給特定的程式使用 isolcpus=cpu_id 將加在 GRUB 的設定檔中，這樣 Linux 的排程器在預設的狀況下就不會將任何一般行程放在這個被保留的 CPU 核中執行。 GRUB 設定檔 ```bash sudo vim /etc/default/grub ``` ``` GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=15" ``` ``` sudo update-grub reboot ``` 更改後： ![image](https://hackmd.io/_uploads/rysu1N8fC.png) 執行確認是否都在 CPU 15 上面。 ``` [ 648.867855] Set sort method: Quick Sort [ 648.867871] Do QSORT [ 648.867904] sort: [CPU#15] qsort_algo [ 648.868073] Set sort method: Tim Sort [ 648.868075] Do TIMSORT [ 648.868114] sort: [CPU#15] timsort_func [ 648.868115] Start timsort_algo [ 648.868241] Set sort method: Library Sort [ 648.868243] Do LINUXSORT [ 648.868245] sort: [CPU#15] linuxsort_algo ``` 使用者空間還有 Linux 核心模組執行時間圖表。 * 沒有將指定 CPU 限定只能給指定的工作。沒有指定 cpu。 ![time_analysis](https://hackmd.io/_uploads/S116ZeHz0.png) ![time_analysis](https://hackmd.io/_uploads/rkjTVMrf0.png) 指定 workqueue 在 cpu 15。 ![time_analysis](https://hackmd.io/_uploads/ryvutG8fC.png) [commit 7d9e0f3](https://github.com/sysprog21/ksort/commit/7d9e0f33b73574c89bc2ebfb4d2feb824e06fbce) 使用者和核心空間測量的執行時間可以看出核心的執行時間會略少於使用者的核心空間，但差距都不會很大，除了 Timsort，是因為在使用排序之前需要先將陣列轉換成 list api 的形式，排序之後要在轉回去陣列的形式，轉換的時間形成了圖上的差距。 * 將指定 CPU 限定只能給指定的工作。 ![time_analysis](https://hackmd.io/_uploads/Hkz3MVUGR.png) ### 考慮到產生亂數的時間和可預測性，改用 xorshift128+ 作為 PRNG，並善用 xoro 核心模組在執行 user 的之前就已經將 `xorshift128+` 核心模組載入核心，所以我們只需要知道模組 Device name 透過 `open()` 取得 file descriptor ，透過 VFS read() 操作取得亂數。 ```c #define XORO_DEV "/dev/xoro" sorttime time_analysis(size_t num) { int fd = open(KSORT_DEV, O_RDWR); int fdxoro = open(XORO_DEV, O_RDWR); if (fd < 0 || fdxoro < 0) { perror("Failed to open character device"); goto error; } /*...*/ size_t n_elements = num; size_t size = n_elements * sizeof(int); int *inbuf = malloc(size); int *xorobuf = malloc(size); if (!inbuf || !xorobuf) { goto error; } for (size_t n_bytes = 1; n_bytes <= n_elements; n_bytes++) { /* Clear/zero the buffer before copying in read data. */ zero_rx(); /* Read the response from the LKM. */ ssize_t n_bytes_read = read(fdxoro, rx, n_bytes); if (0 > n_bytes_read) { perror("Failed to read all bytes."); goto error; } uint64_t value_ = 0; for (int b_idx = 0; b_idx < n_bytes_read; b_idx++) { unsigned char b = rx[b_idx]; value_ |= ((uint64_t) b << (8 * b_idx)); } xorobuf[n_bytes - 1] = value_ % n_elements; } { /*Timsort test*/ memcpy(inbuf, xorobuf, size); } /*...*/ } { /*Linux sort test*/ memcpy(inbuf, xorobuf, size); /*...*/ } { /*qsort test*/ memcpy(inbuf, xorobuf, size); /*...*/ } } ``` :::info 這裡有一個很有趣的發現，如果透過 `xorshift128+` 得到的數值如果不取模同於就會造成排序錯誤，但是不知道為什麼。 ::: * 時間測量 ![time_analysis](https://hackmd.io/_uploads/Bk485kFMR.png) ![time_analysis](https://hackmd.io/_uploads/ryscAxYMC.png)