應用電腦視覺 - 鄭文皇 (2022 Fall)

tags: `NYCU-2022-Fall`

Class info.

課程資訊

Learn the concepts and theories of Computer Vision (CV) and how they can be applied in practice to solve real-world problems.
Also cover the latest topics in current CV literature, such as self-supervised learning for CV applications.

作業50%、期中報告20%、期末考試30%

有邀講者來改成，作業40%、期中報告20%、期末考試30%、演講出席10%

Date

9/12

Computer Vision:

feature engineering + model learning \(\rightarrow\) deep learning

feature engineering: f = \(f(I)\)
model learning: y = \(g(f,\theta)\)
deep learning: y = \(g(I,\theta)\)

Feature Detector
視覺系統的子系統，用來檢測存在或視覺場景中某些特徵的缺失

Image data from real world often display complex structure

In general, computer vision does not work. (except in certain cases)

Intra-class Variability
相同影像類別，但不同照片呈現方式

9/19

intensity: 色彩亮度 \(\frac{R+G+B}{3}\)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

In comparison to global features, local features are more robust to occlusion and clutter.

Properties of Ideal Local Feature

Repeatability
Distinctiveness / Informativeness (鑑別性: 局部結構變化，feature也要有變化)
Locality
Quantity
Accuracy
Efficiency

Sobel operator

Before designing an edge detector

Use derivatives (in x and y direction) to define a location with high gradient
Need smoothing to reduce noise prior to take derivative

Edge Detector in 1D & 2D

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Convolution

\(g\) 翻轉，之後依照 \(\tau\) 值平移過 \(f\)

連續形式: \((f*g)(n)=\int^{\infty}_{-\infty}f(\tau)g(n-\tau)d\tau\)

離散形式: \((f*g)(n)=\sum_{\tau=-\infty}^{\infty}f(\tau)g(n-\tau)\)

Canny Edge Detection 實作文章
Large \(\sigma\) detects large scale edges, small \(\sigma\) detects fine feature
Image Gradient

Magnitude: \(\| ∇f \|=\sqrt{(\frac{\partial f}{\partial x})^2 + \frac{\partial f}{\partial y})^2}\)

Direction: \(\theta = \tan^{-1}(\frac{\partial f}{\partial y} / \frac{\partial f}{\partial x})\)

Harris Corner Detector 實作文章

Invariant to large rotation, translation. But not-invariant to image scale, it doesn’t tell us the scale of the corner

過一次 0 就有一個 edge

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Impulse response

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Laplace operator, Laplacian

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

SIFT Algorithm

右邊 Gaussian 為左邊的兩倍

9/26

Appendix-SIFT

Keypoint Localization
SIFT Descriptor

OpenCV SIFT
HoG (Histogram of Oriented Gradients)
HoG

Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →
Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →
Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →

LBP (Local Binary Patterns)
LBP is a non-parametric descriptor whose aim is to efficiently summarize the local structures of images.
Types of Object Detection
- Detection of specific categories
- Detection of specific instance

Object Classification

Image Classification Architectures review
ImageNet Dataset
- ImageNet with roughly 1000 images in each of 1000 categories.
AlexNet

Semantic Segmentation

Sliding Window
Downsampling & upsampling (solve the expensive convolution cost)
U-Net

There is no universal agreement in the literature on the definitions of various vision subtasks

Two Main Categories for Generic Object Detection
Region Proposals
R-CNN & Fast R-CNN
[Paper] EDF-SSD: An Improved Feature Fused SSD for Onjection Detection

10/3

Convolution size for kernel: height × width × dense

縮減深度計算量。Not really a 1x1 convolution → It's a 1x1xC convolution

A Fire module is comprised of:
a squeeze convolution layer (which has only 1x1 filters), feeding into an expand layer that has a mix of 1x1 and 3x3 convolution filters.

Top-1 ImageNet Accuracy: 僅能給 1 個答案
Top-5 ImageNet Accuracy: 能給 5 個答案

Skip connections not only skip one layer

The advantage of adding this type of skip connection is that if any layer hurt the performance of architecture then it will be skipped by regularization.
So, this results in training a very deep neural network without the problems caused by vanishing/exploding gradient.
In conclusion, ResNets are one of the most efficient Neural Network Architectures, as they help in maintaining a low error rate much deeper in the network.

DenseNet

Feature Pyramid Networks

Conventional two-stage solutions adopt the detect-then-segment approach → Slow
Focus on single-stage instance segmentation

Local-mask-based Methods
- Contours with Explicit Encoding
  - ExtremeNet (Four extreme points with one center point of objects)
    同時，通過四個方向可以求得中心點（center point）。（實際上，一個方向上的極值點可能不止一個）
  - PolarMask: It utilizes rays at constant angle intervals from the center to describe the contour.
  - FourierNet: a contour shape decoder using Fourier transform
- Compact Mask Encoding

Contours with Explicit Encoding

pros: fast to inference and easy to optimize.
cons: can not depict the mask precisely and can not describe objects that have holes in the center.

Global-mask-based Methods
- YOLACT: attempting real-time instance segmentation
- BlendMask

10/10

國慶日放假

10/17

Challenge of Long-Tailed Visual Recognition

Loss function
- MSE:\[f^* = \rm{arg} \min_f \mathbb{E}_{x,y \sim p_{data}} \| y - f(x) \|^2\]
- MAE:\[f^* = \rm{arg} \min_f \mathbb{E}_{x,y \sim p_{data}} \| y - f(x) \|_1\]
- Cross Entropy: \[L = -\frac{1}{m} \sum_{i=1}^m y_i \cdot \ln(\hat{y}_i)\]
Solutions in the Literature for Long-Tailed Visual Recognition
- Re-sampling:
  - over-sampling (adding repetitive data) for the minority class
  - under-sampling (removing data) for the majority class
- Re-weighting: \[L = -\sum^{\mathcal{C}}_{i=1} w_i y_i \log p_i\]
Class-Balanced Loss

\[ \rm{CB}(\textbf{p}, y) = \frac{1}{E_{n_{y}}} \mathcal{L} (\textbf{p}, y) = \frac{1 - \beta}{1 - \beta^{n_y}} \mathcal{L}(\textbf{p}, y) \]

re-balancing = re-sampling + re-weighting

Feature extractor
classifier

What is transfer learning?

Transfer learning is about leveraging feature representations from a pre-trained model, so you don't have to train a new model from scratch.

The pre-trained models are usually trained on massive datasets that are a standard benchmark in the computer vision frontier.

Bilateral-Branch Network

Mitigating Dataset Bias (BMVC 2020 Keynote)

Dataset bias

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Techniques that help deal with data bias
- Collect labelled data from target domain
- Better backbone CNNs
- Batch Normalization (Li'17, [Chang’19])
- Instance Normalization + Batch Normalization Nam'19
- Data Augmentation, Mix Match Berthelot'19
- Semi-supervised methods, such as Pseudo labeling Zou’19
- Domain Adaptation (this talk)
Adversarial domain alignment
- Feature space
- Pixel space

10/24

Pixel-space alignment
Few-shot domain translation
Lots of unlabeled target data, but only have 1-5 images of the target domain
Disentangled features
Weak Scene-level Alignment
Alignment that respects class boundaries
Category Shift
When categories aren't the same in source and target

Recognition of Static Pose
Recognition of Dynamic Pose
Pose Model

Inverse Kinematics

Exploiting Temporal Dependence

10/31

Recurrent Neural Networks (RNN)

RNN cell

The Problem of RNN: Short-term Memory

If a sequence is long enough, they’ll have a hard time carrying
information from earlier time steps to later ones.

Long Short Term Memory (LSTM) was created as the solution to short-term memory.
It has internal mechanisms called gates that can
regulate the flow of information.

RNN Notes

GRU (Gated Recurrent Unit)

Deep LSTM

Two-way LSTM

Connectionist Temporal Classification (CTC)

11/7

Attention Model

\(c\) is the context, and the \(y_i\) are the “part of the data” we are looking at.

\[ m_i = \rm{tanh}(W_{cm}c + W_{ym}y_i) \]

The network computes \(m_1, … m_n\) with a tanh layer

\[ softmax(x_1, ..., x_n) = (\frac{e^{x_i}}{\sum_j e^{x_j}})_i \\ z = \sum_i(s_iy_i) \]

The output \(z\) is the weighted arithmetic mean of all the \(y_i\), where the weight represent the relevance for each variable according the context \(c\).

CV Weekly
- Generate video from text
- DIFFUSIONDB: Dataset for Text-to-Image Generative Models

3D data representation

Point Cloud	Mesh
A point cloud is a set of data points in space, which measures a large number of points on the external surfaces of objects around them.	A mesh is a collection of vertices, edges and faces that defines the shape of a polyhedral. The faces usually consist of triangles (triangle mesh), quadrilaterals, or other simple convex polygons.
Voxel	Multi-View Images
A voxel represents a value on a regular grid in three-dimensional space.	Multi-view images are multiple looks of the same target, e.g., at different viewing angles, perspectives, and so forth.

Deep Learning on Multi-view Representation

Challenge

對抗 Geometric form (irregular) 排列表示不同的一至性

Permutation invariance: Symmetric function

\[ f(x_1, x_2, ..., x_n) \equiv f(x_{\pi_1}, x_{\pi_2}, ... x_{\pi_n}), x_i \in \mathbb{R}^D \]

Examples:

\[ f(x_1, x_2, ..., x_n) = \max\{x_1, x_2, ..., x_n\} \\ f(x_1, x_2, ..., x_n) = x_1 + x_2 + ... + x_n \]

Input Alignment by Transformer Network

PointNet Architecture

Recap RNN /LSTM

RNN Notes

Transformer network

Transformer Notes

PyTorch Transformer

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Vision Transformer

11/14

課程調整

Dates	Topic
11/21	Invited Talks
11/28	Invited Talks
12/05	Midterm Presentation
12/12	Midterm Presentation
12/19	Invited Talks / Deep Generation Modeling
12/26	Final Examination

Homework 2: Transformer
Homework 3: Invited Talks 500字心得

11/21

暗光影像增強計算

北京大學 - 劉家瑛教授

Research topic:
- Image Reconstrucion
- ImageVideo Coding
- Image Generation
- Video Analytics
Low-Light Degradation
Intensive noise
Problem: High-level vision in low-light scenarios
Reperesentative work
- Histogram equalization
- Dehazing method (invert \(\rightarrow\) dehaze \(\rightarrow\) invert again)
- Retinex Model (retinex decomposition (\(S = R \cdot L\)) / generate result (\(S_{enhance} = R \cdot L^{\frac{1}{\gamma}}\)))
- Learning-Based Model (LLNet/LLCNN…)
- Low-Light Datasets for High-Level Tasks (KAIST / Exclusively Dark)
Deep Retinex Decomposition for Low-Light Enhancement
- Retinex Theory + Deep Learning
- Dataset: LOl Light
Benchmarking Low-Light Image Enhancement and Beyond
- Paired datasets: LLNet
- Unpaired datasets: can't support for model training
- VE-LOL: evaluation of low/high-level visions
- UG2 challenge
HLA-Face: Joint High-Low Adapation for Low Light Face Detection
- Gaps between normal light and low light (Pixel-levle apperances/object-level sentiment)
- Consider joint low-level and high-level adaptation
Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised Adaptation
- Training strategy asymmetric self-supervised alighment

非漫射複雜材質物體的多視角三維視覺建模

澳洲國立大學 - Hongdong Li教授

Research topic:
- Computer Vision
- Robotic Vision
- Smart Car Project
- City Modeling
- Bionic Eyes Project
Multi-view 3D Reconstruction of a Texture-less Smooth Surface of Unknown Generic Reflectance
- Vision-based 3D Shape Reconstrucion
- (Rigid Object / Scene) Structure from model
- Lambertian / Non-Lambertian
- Problem Setting: Traditional Photometric Stereo problem
- 3D computer vision \(\leftrightarrow\) image inversion
- The rendering equation
- Solution: Minimizing a suitable objective (loss) function (augmented language nethod relaxation)
  image formation + surface regularization + relax penalty
Diffeomorphic Neural Surface Parameterization for 3D and Reflectance Recovery
- Shape deformation
- Learning / training process: Inverse graphics rendering

Recap
- Multi-view 3D reconstruction for object with unknown materials.
- Significantly outperforms SOTAs under unknown illuminations
- Achieves similar accuracy to darkroom methods but much more flexible
- Robust to complex shapes and specular materials
- Reconstructions can be easily plugged into rendering engines
- Limitations: piecewise smooth object shape assumption with simple topology; need a strong flashlight (SNR) slow convergence

Consistent, Empathetic and Prosocial Dialogues

Prof. Gunhee Kim

ProsocialDialog: A Prosocial Backbone for Conversational Agents
- Anticipating safety issues in E2E Converisoal AI: Framework and Tooling
- Dataset: DailyDialog / PersonaChat / EmpatheticDialogues… (All of those are biased towards positivity)
- Classification models trained on GoEmotions
- Canary / Prost
Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness
- Public self-consciousness is this awareness of the self as a social object that can be observed and evaluated by others
- Bayesian Rational Speech Acts framework, which has been originally applied to improving informativeness of referring expressions.
Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes
- Related work
  - Empathetic dialogue modeling
  - Emotion Cause (Pair) Extraction
  - Rational Speech Acts (RSA) framework

11/28

Context Autoencoder for Scalable Self-Supvervised Representation Pretraining

Baidu computer vision expert - Jingdong Wang (王井东)

Vision Foundadtion Models
- Big Data
- Big Parameter
- Big Task
- Big Algorithm
- Big Computation
Representation Pretraining
- Goal: Learn an encoder mapping an image to a representation
- Pretraining Task \(\rightarrow\) Downstream Task
- Scale Up: Sample scale (no supervised, yes semi-supervised / vision-language / self-supervised), Concept scale (no supervised / semi-supervised, yes vision-language / self-supervised)
Self-Supvervised Representation Pretraining in Vision
- Contrastive pretraining
- Masked image modeling
- Other
CAE: representation pretraining aims to learn an encoder, mapping an image to a representation that can be transferred to downstream task.
- Regressor for masked image modeling \(\rightarrow\) masked representation modeling:
  make predictions for masked patches from visible patches in the encoded representation space for solving the masked image modeling task.
- The encoder is dedicated for representation pretraining, and representation pretraining is only by the encoder.
- The task completion part (regressor and decoder) is separated from the encoder.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Figure 1: Context autoencoder

How contrastive pretraining works ?
- How can the representations of random crops from the same original image be similar ?
  - Speculation: encoder extracts the representation of the part of the object / prejector maps the part representation to the representation of the whole object
  - The projected representations than agree
- What representation are learned ?
  - Observation: The common among random crops lie in the center of the original image / The object in ImageNet image lies in the center
  - Conjecture: Contrastive pretraining mainly learns the semantics of the center region

Github repo.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Table 1: Pretraining quality evaluation

Relational and Structural Vision with High-Order Feature Transforms

POSTECH - Minsu Cho

Match and transfer

Relational Self-Attention: What's Missing in Attention for Video Understanding
SPair-73k: A Large-scale Benchmark for Semantic Correspondence
Convolutional Hough Matching Networks
TansforMatcher: Match-to-Match Attention for Semantic Correspondence
Few-shot image segementation
- Hypercorrelation Squeeze for Few-Shot Segementation
Structure of correspondence in space
- Learning to Discover Reflection Symmetry via Polar Matching Convolution
Motion-aware video recognition
- Learning Self-Similarity in Space and Time as Generalized Motion
Relational Self-Attention
- Relational Self-Attention: What's Missing in Attention for Video Understanding
Summary
- Real-world vision systems need to leverage relational and structural patterns of images and videos for systematic understanding.
- High-order convolution or self-attention is effective for capturing relational structures by considering geometric patterns of correlation.
- Learning relational structures is crucial for minimally-supervised recognition and structural perception of images and videos.

AURORA - Empirical Bayes from Replicates

Stanford University - Dennis L. Sun

Empirical Bayes mean estimation with nonparametric errors via order statistic regression on replicated data
Estimate some quality \(\mu_i\) from noisy observation \(\textbf{Z} = \{Z_1, ... Z_N\}\).
Empirical Bayes: First estimate \(A\) using the data, then plug it into the prior.
- Prior: \(G = \mathcal{N}(0, A)\)
- Likelihood: \(F(\cdot \ | \ \mu_i) = \mathcal{N}(\mu_i, \sigma^2)\)

12/21

Deploying CV at Edge - From Recent Vision Transformer to Future Metaverse

Computing and AI Technology Group, MediaTek Inc.

Part1 Overview

NIPS
Marching toward metaverse era

Part2 Deploying Vision transformer at edge

Computer vision resesrch evolves rapidly
How to use them in out daily devices

鄭嘉珉ＭＴＫ資深經理

Focus more on experimence sharing, especially from CV research to produciton in MTK

NIPS

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Important topic :

Adversarial robustness,
Federated learning,
Diffusion model,
NeRF(Neural Radiance Field),
NeMF(Neural Motion Field),
CCNeRF(Compressilble-composable NeRF),
GNN

Metaverse

Challenge

High computing
Low latency
Low power
Tiny form factor
Display:
- immersive display experience
Graphics
Motion-to-Photo latency
- VR : under 20ms
- AR : under 5ms
Concurrent multiple tasks

DNN > Edge Process > More's law

姜政銘(Jimmy Chiang)

Edge AI in MTK
Vision Transformer
ＭＴＫ重視的ＡＩ人才
給準備進入職場的大家

Edge AI KSF:
Noise Reduct, Super resolution

CAI部門

AI-ALG演算法被賦予的任務

AI CV
AI NLP
AI Network
AI Methodology
AI for 5G
AI Architecture

AI-SW:

串接gpu->cuda->pytorch->python code
NeuroPilot SW 串接手機上的gpu

AI-HW:

如何在有限的cost下，設計出高效率的ＡＰＵ

想要讓訓練好了模型跑在手機上，需要做哪些事？

如何整合ＮＡＳ，Ｑuatization?
轉出平台支援的格式？
結果超級慢？

Vision transformer

Patch embedding
- Opertaion
- Challenges in APU
  - memory access is one of the bottlenecks in APU
- Patch-wise is like 'Sliding window' in convolution
  - Patch size
Multihead Self-attention - Challenge
- global self-attention requires quadratic computing complexity
- The most challenge in APU => over 95% latency code in ViT
  - Matrix multiplication
  - Softmax

Summary:

Global attention has better quality but suffer from MatMul and Sofmax
Cross-varaice is favorable for high-resolution and less-chanel

Softmax Complexity

Sofmax : naive formula doesn't work due to numeriacal stability(overflow)
Most AI accelerators support float16 instead of float32 data format got better PPA(Performance, Power, Area)
What happen if using float16? UNDERFLOW

Norm-Layer Challenge

Overflow occur after Mul, which calculate varaince \(\sigma\)
Underflow occur is Rsqrt

MLP-GELU Challenge Overview

GELU activation is wodely used in Tansformer
It's impractical to implement error function in AI accelerator!

What papers might not tell you, but matter in edge AI

Low MAC/FLOPs doesnt imply high efficeintcy
Accuracy in paper does not guarantee acuuracy in edge device
Paper reports performance in mobile CPU and GPU

職場

通用法則
- 基本功
- 團隊合作及溝通
- 好奇心
- 獨立思考
- 學習心態

人才？
- 算法，硬體，軟體
- 投履歷，準備好投影片

Paper list

Paper	Conference / Year
You Only Cut Once: Boosting Data Augmentation with a Single Cut, ICML 2022.	ICML/2022
Scaled-YOLOv4: Scaling Cross Stage Partial Network	CVPR / 2021
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation	CVPR / 2022
taming transformers for high-resolution image synthesis	CVPR/2021
BEIT: Bert Pre-Training of Image Transformers	ICLR/2022
GAN-Supervised Dense Visual Alignment	CVPR/2022
Point-BERT: Pre-Training 3D Point Cloud Transformers with Masked Point Modeling	CVPR/2022
FMODetect: Robust Detection of Fast Moving Object	ICCV / 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	CVPR / 2021
Boosting Crowd Counting via Multifaceted Attention*	CVPR / 2022
Focal and Global Knowledge Distillation for Detectors	CVPR / 2022
VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution	CVPR/2022
RefineFace: Fefinement Neural Network for High Performance Face Detection	TPAMI/2021
Restormer: Efficient Transformer for High-Resolution Image Restoration	CVPR/2022 (Oral)
Learning the Degradation Distribution for Blind Image Super-Resolution	CVPR / 2022
Pose Recognition With Cascade Transformers	CVPR/2021
Deep Constrained Least Squares for Blind Image Super-Resolution	CVPR / 2022
ACPL:Anti-curriculm Psudo-lablling for Semi-supervised Medical Image Clasification	CVPR / 2022
CoMoGAN: continuous model-guided image-to-image translation	CVPR/2021
TrackFormer: Multi-Object Tracking with Transformers	CVPR/2022
Contrastive Embedding for Generalized Zero-Shot Learning	CVPR/2021
Masked Autoencoders Are Scalable Vision Learners	CVPR/2021
Crafting Better Contrastive Views for Siamese Representation learning	CVPR/2022
GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields	CVPR/2021
Scaling Vision Transformers	CVPR/2022
Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions	AAAI/2022
EditGAN: High-Precision Semantic Image Editing	NeurIPS 2021

Final exam (Open anything)

Local Binary Patterns (15%)
- 怎樣算 ?
- 給三張 image patch 比較跟原圖相似度
Attention (\(Z\)) 怎樣算，題目已經給公式跟\(K, V, Q\) 的矩陣 (20%)
給一篇 paper MetaFormer is Actually What You Need for Vision, 問他跟原先 transformer 的差異為何, 他怎樣改進效能 (20%)
給一篇 paper
Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment
- 問 3D STN 跟這篇 paper 的差異, 彼此的優缺點 (15%)
- 問 Hard sample 怎樣增加模型的 robustness, 須參考 Hard Sample Mining (10%)
課程意見反饋 (20%)

Reference

原文書電子檔申請

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

應用電腦視覺 - 鄭文皇 (2022 Fall)

tags: NYCU-2022-Fall

Class info.

Date

9/12

9/19

9/26

10/3

10/10

10/17

10/24

10/31

11/7

11/14

11/21

暗光影像增強計算

非漫射複雜材質物體的多視角三維視覺建模

Consistent, Empathetic and Prosocial Dialogues

11/28

Context Autoencoder for Scalable Self-Supvervised Representation Pretraining

Relational and Structural Vision with High-Order Feature Transforms

AURORA - Empirical Bayes from Replicates

12/21

Computing and AI Technology Group, MediaTek Inc.

Part1 Overview

Part2 Deploying Vision transformer at edge

NIPS

Metaverse

Challenge

CAI部門

想要讓訓練好了模型跑在手機上，需要做哪些事？

Vision transformer

Softmax Complexity

Norm-Layer Challenge

MLP-GELU Challenge Overview

What papers might not tell you, but matter in edge AI

職場

Paper list

Final exam (Open anything)

Reference

tags: `NYCU-2022-Fall`