or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
應用電腦視覺 - 鄭文皇 (2022 Fall)
tags:
NYCU-2022-Fall
Class info.
課程資訊
作業50%、期中報告20%、期末考試30%
有邀講者來改成,作業40%、期中報告20%、期末考試30%、演講出席10%
Date
9/12
Computer Vision:
feature engineering + model learning \(\rightarrow\) deep learning
feature engineering: f = \(f(I)\)
model learning: y = \(g(f,\theta)\)
deep learning: y = \(g(I,\theta)\)
視覺系統的子系統,用來檢測存在或視覺場景中某些特徵的缺失
Image data from real world often display complex structure
In general, computer vision does not work. (except in certain cases)
相同影像類別,但不同照片呈現方式
9/19
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →In comparison to global features, local features are more robust to occlusion and clutter.
Before designing an edge detector
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →\(g\) 翻轉,之後依照 \(\tau\) 值平移過 \(f\)
連續形式: \((f*g)(n)=\int^{\infty}_{-\infty}f(\tau)g(n-\tau)d\tau\)
離散形式: \((f*g)(n)=\sum_{\tau=-\infty}^{\infty}f(\tau)g(n-\tau)\)
Canny Edge Detection 實作文章
Large \(\sigma\) detects large scale edges, small \(\sigma\) detects fine feature
Image Gradient
Magnitude: \(\| ∇f \|=\sqrt{(\frac{\partial f}{\partial x})^2 + \frac{\partial f}{\partial y})^2}\)
Direction: \(\theta = \tan^{-1}(\frac{\partial f}{\partial y} / \frac{\partial f}{\partial x})\)
Invariant to large rotation, translation. But not-invariant to image scale, it doesn’t tell us the scale of the corner
過一次 0 就有一個 edge
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Laplace operator, Laplacian
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →右邊 Gaussian 為左邊的兩倍
9/26
Appendix-SIFT
Keypoint Localization



SIFT Descriptor

OpenCV SIFT
HoG (Histogram of Oriented Gradients)
HoG
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →LBP (Local Binary Patterns)


LBP is a non-parametric descriptor whose aim is to efficiently summarize the local structures of images.
Types of Object Detection
Object Classification
Image Classification Architectures review
ImageNet Dataset
AlexNet

Semantic Segmentation
Sliding Window

Downsampling & upsampling (solve the expensive convolution cost)

U-Net
There is no universal agreement in the literature on the definitions of various vision subtasks
Two Main Categories for Generic Object Detection

Region Proposals
R-CNN & Fast R-CNN
[Paper] EDF-SSD: An Improved Feature Fused SSD for Onjection Detection
10/3
Convolution size for kernel: height × width × dense
a squeeze convolution layer (which has only 1x1 filters), feeding into an expand layer that has a mix of 1x1 and 3x3 convolution filters.
Top-1 ImageNet Accuracy: 僅能給 1 個答案
Top-5 ImageNet Accuracy: 能給 5 個答案
The advantage of adding this type of skip connection is that if any layer hurt the performance of architecture then it will be skipped by regularization.
So, this results in training a very deep neural network without the problems caused by vanishing/exploding gradient.
In conclusion, ResNets are one of the most efficient Neural Network Architectures, as they help in maintaining a low error rate much deeper in the network.
A Simple yet Effective Approach for Identifying Unexpected Road Obstacles
Deep Learning for Generic
Object Detection: A Survey
同時,通過四個方向可以求得中心點(center point)。 (實際上,一個方向上的極值點可能不止一個)
Contours with Explicit Encoding
pros: fast to inference and easy to optimize.
cons: can not depict the mask precisely and can not describe objects that have holes in the center.
10/10
國慶日放假
10/17
Loss function
Solutions in the Literature for Long-Tailed Visual Recognition
Class-Balanced Loss
\[ \rm{CB}(\textbf{p}, y) = \frac{1}{E_{n_{y}}} \mathcal{L} (\textbf{p}, y) = \frac{1 - \beta}{1 - \beta^{n_y}} \mathcal{L}(\textbf{p}, y) \]
What is transfer learning?
Transfer learning is about leveraging feature representations from a pre-trained model, so you don't have to train a new model from scratch.
The pre-trained models are usually trained on massive datasets that are a standard benchmark in the computer vision frontier.
Mitigating Dataset Bias (BMVC 2020 Keynote)
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Techniques that help deal with data bias
Adversarial domain alignment
10/24
Pixel-space alignment

Few-shot domain translation

Lots of unlabeled target data, but only have 1-5 images of the target domain
Disentangled features



Weak Scene-level Alignment


Alignment that respects class boundaries

Category Shift

When categories aren't the same in source and target
Recognition of Static Pose
Recognition of Dynamic Pose
Pose Model
10/31
The Problem of RNN: Short-term Memory
If a sequence is long enough, they’ll have a hard time carrying
information from earlier time steps to later ones.
Long Short Term Memory (LSTM) was created as the solution to short-term memory.
It has internal mechanisms called gates that can
regulate the flow of information.
RNN Notes
11/7
\(c\) is the context, and the \(y_i\) are the “part of the data” we are looking at.
\[ m_i = \rm{tanh}(W_{cm}c + W_{ym}y_i) \]
The network computes \(m_1, … m_n\) with a tanh layer
\[ softmax(x_1, ..., x_n) = (\frac{e^{x_i}}{\sum_j e^{x_j}})_i \\ z = \sum_i(s_iy_i) \]
The output \(z\) is the weighted arithmetic mean of all the \(y_i\), where the weight represent the relevance for each variable according the context \(c\).
Point Cloud
Mesh
Voxel
Multi-View Images
對抗 Geometric form (irregular) 排列表示不同的一至性
Permutation invariance: Symmetric function
\[ f(x_1, x_2, ..., x_n) \equiv f(x_{\pi_1}, x_{\pi_2}, ... x_{\pi_n}), x_i \in \mathbb{R}^D \]
Examples:
\[ f(x_1, x_2, ..., x_n) = \max\{x_1, x_2, ..., x_n\} \\ f(x_1, x_2, ..., x_n) = x_1 + x_2 + ... + x_n \]
Input Alignment by Transformer Network
RNN Notes
Transformer Notes
PyTorch Transformer
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Vision Transformer
11/14
課程調整
11/21
暗光影像增強計算
北京大學 - 劉家瑛教授
Research topic:
Low-Light Degradation
Intensive noise
Problem: High-level vision in low-light scenarios
Reperesentative work
Deep Retinex Decomposition for Low-Light Enhancement
Benchmarking Low-Light Image Enhancement and Beyond
HLA-Face: Joint High-Low Adapation for Low Light Face Detection
Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised Adaptation
非漫射複雜材質物體的多視角三維視覺建模
澳洲國立大學 - Hongdong Li教授
Research topic:
Multi-view 3D Reconstruction of a Texture-less Smooth Surface of Unknown Generic Reflectance
image formation + surface regularization + relax penalty
Diffeomorphic Neural Surface Parameterization for 3D and Reflectance Recovery
Consistent, Empathetic and Prosocial Dialogues
Prof. Gunhee Kim
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness
Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes
11/28
Context Autoencoder for Scalable Self-Supvervised Representation Pretraining
Baidu computer vision expert - Jingdong Wang (王井东)
Vision Foundadtion Models
Representation Pretraining
Self-Supvervised Representation Pretraining in Vision
CAE: representation pretraining aims to learn an encoder, mapping an image to a representation that can be transferred to downstream task.
make predictions for masked patches from visible patches in the encoded representation space for solving the masked image modeling task.
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 1: Context autoencoder
Github repo.
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Table 1: Pretraining quality evaluation
Relational and Structural Vision with High-Order Feature Transforms
POSTECH - Minsu Cho
Match and transfer
Relational Self-Attention: What's Missing in Attention for Video Understanding
SPair-73k: A Large-scale Benchmark for Semantic Correspondence
Convolutional Hough Matching Networks
TansforMatcher: Match-to-Match Attention for Semantic Correspondence
Few-shot image segementation
Structure of correspondence in space
Motion-aware video recognition
Relational Self-Attention
Summary
AURORA - Empirical Bayes from Replicates
Stanford University - Dennis L. Sun
Empirical Bayes mean estimation with nonparametric errors via order statistic regression on replicated data
Estimate some quality \(\mu_i\) from noisy observation \(\textbf{Z} = \{Z_1, ... Z_N\}\).
Empirical Bayes: First estimate \(A\) using the data, then plug it into the prior.
12/21
Deploying CV at Edge - From Recent Vision Transformer to Future Metaverse
Computing and AI Technology Group, MediaTek Inc.
Part1 Overview
Part2 Deploying Vision transformer at edge
鄭嘉珉 MTK資深經理
Focus more on experimence sharing, especially from CV research to produciton in MTK
NIPS
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Important topic :
Metaverse
Challenge
DNN > Edge Process > More's law
姜政銘(Jimmy Chiang)
Edge AI KSF:
Noise Reduct, Super resolution
CAI部門
AI-ALG演算法被賦予的任務
AI-SW:
AI-HW:
想要讓訓練好了模型跑在手機上,需要做哪些事?
Vision transformer
Summary:
Softmax Complexity
Norm-Layer Challenge
MLP-GELU Challenge Overview
What papers might not tell you, but matter in edge AI
職場
Paper list
Continuous Space-Time Super-Resolution
Final exam (Open anything)
Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment
Reference
原文書電子檔申請