教授申請
====
## 研究成果
- Neural Scene Representation and Rendering for Unknown Cameras [published in ICCV 2021]
神經場景表示與渲染模型旨在使用多個觀察與對應的相機位姿資訊構建出隱式場景表示並可以給定新的相機位置渲染出該位置的影像。大多數神經場景選染模型使用相機投影與光線追蹤的運算來達到幾何合理的渲染效果。在我的研究中,我提出了空間轉換路由的機制,將相機投影的過程抽象成特徵在不同的儲存單元中進行傳送的過程,如圖?所示。透過空間轉換路由機制建構出來的神經場景模型可以以未知相機內部參數的影像作為輸入渲染出與基於幾何的模型接近甚至超越的效果,圖?展示了我的模型與其他模型比較的效果。
加上scene arithmic 實驗跟其他效果。
Neural scene representation and rendering models aim to construct implicit scene representations using multiple observations with corresponding camera poses and render images of novel views. Most neural scene rendering models apply camera projection and ray tracing operations to achieve geometrically plausible rendering results. In my research, I propose a spatial transformation routing mechanism that abstracts the spatial transform as a process of message passing between different feature cells, as shown in Figure 1. Compared to other models that require precise camera projection parameters, the proposed model can render geometrically plausible images based on the observations of unknown camera.
- Scalable Neural Scene Memory for Navigation [published in AAAI 2023]
神經場景表示建構出的隱式地圖包含了抽象語意特徵,可以運用在更有彈性的進階導航任務上。在我的研究中,基於神經場景表示模型提出了場景記憶網路,可以將觀察影像抽取特徵並建構可擴展的記憶單元,如圖?所示。這個記憶單元則可以作為場景地圖透過注意力機制與強化學習模型結合如圖?所示。除此之外,我還提出了使用記憶單元的前後時間點的資訊熵差做為好奇心獎勵,可以鼓勵智能體在稀疏獎勵的環境下探索未知的區域。實驗結果展示了引入場景記憶的強化學習模型在蒐集物品的任務上有更好的結果,而加入好奇心獎勵也讓智能體能夠提升單物體搜尋的成功率。
Based on the neural scene representation model, I proposed scene memory network which extracts features from observed images and constructs scalable memory units as shown in Figure 5. The memory units can be taken as the additional state of reinforcement learning models which stores the information of previous observations as shown in Figure 2. Additionally, I proposed using the entropy difference of memory units between two time steps as a curiosity reward, which encourages the agent to explore unknown areas in environments with sparse rewards. Experimental results demonstrated that the reinforcement learning model with scene memory achieves better results in object collection tasks, and the use of curiosity reward also improves the success rate of single object searching task.
- Autonomous Environment Exploration
要完成導航任務需要有足夠精確的地圖,而如何有效率的進行環境探索並建構精確地圖在機器人領域中是一個重要的問題。我的研究基於使用二維雷射激光構建二維網格地圖的情境,提出使用深度強化學習模型進行環境探索。在局部的尺度中,我計算前後時間點網格地圖機率熵的差異作為鼓勵探索的獎勵;在全局的尺度中,我提出了計算最近地圖邊界作為導航目標來避免智能體困在局部的區域。圖?展示了我所提出的模型架構,圖?展示了我所提出的演算法能夠在環境探索的地圖完整度上優於其他演算法。
Efficiently exploring the environment and construct precise maps is important for navigation systems. My research focus on the scenario of constructing 2D grid maps using lidar sensor and propose a deep reinforcement learning model for environment exploration. At the local scale, the entropy difference of grid maps between two time steps are computed as the reward for encouraging exploration. At the global scale, the nearest map frontier is taken as the navigation goal to avoid the agent getting stuck in local areas. Figure 3 illustrates the architecture of the proposed model, and Figure 4 shows that the proposed method outperforms other methods for environment exploration.
- Multi-player Trajectory Generation for Basketball [published in ??? and ???]
在籃球訓練系統中,虛擬實境訓練系統被證明能給予使用者更直覺的空間關係,並提升學習戰術的效率。在虛擬實境訓練系統中,要如何模擬防守球員在不同情況下的跑位達到接近真實效果並有一定程度的變化是很重要的一環。在我的研究中分別提出了兩種多球員軌跡生成模型。第一種模型使用自回歸架構,用過去時間點的資訊作為輸入預測下一個時間點的資訊,並採用類似WaveNet的因果卷積網路達到快速訓練的效果。由於WaveNet使用的是離散分布而球員的軌跡為連續分布,因此我提出了使用高斯分布來建模,並透過可微分採樣來協助梯度的傳遞。另一種模型則是變分遞迴自編碼器,如圖?所示,首先使用一個遞迴神經網路將球員軌跡抽取出時序相關的隱含狀態,再使用注意力機制將不同時間點的隱含狀態以不同的關注權重融合成全局特徵向量,接著再用全局特徵向量產生變分自編碼器中隱變數的高斯分布參數。接著從高斯分布採樣出來的隱變數則會作為解碼遞迴神經網路的初始狀態解碼回輸入的軌跡。此模型不只可以生成出球員的軌跡,其中編碼出的隱變數還可以進一步進行策略分類或是分群。
Virtual reality training has been shown to help users effectively learn basketball tactics, and it is important to simulate the movements of defending players under different situations with a certain degree of variation in the VR training system. In my research, two multi-player trajectory generation models are proposed. The first model adopts the autoregressive architecture with causal convolution for fast training, and utilizes differentiable gaussian sampling for gradient propagation, as shown in Figure 5. The second model is a variational recursive autoencoder. The model first applies an recurrent encoder to extract the hidden state of the trajectory at each time step, and utilizes attention mechanism to compute the trajectory latent codes. The latent codes are then taken as the initial state of recurrent decoder to reconstruct the trajectory. In addition to generating player trajectories, the latent codes extracted from the trajectory can be further used for tactical classification or clustering tasks.
## 研究方向
My current research interests include 3D computer vision, reinforcement learning and scene understanding. There are a few possible directions for my future research:
- Scene Reconstruction and Neural Rendering
場景重建在電腦視覺領域是一個長期被探討的問題,結合生成模型與神經渲染模型相較於傳統幾何最佳化的模型可以達到更好的渲染結果並且建構包含抽象語義資訊的場景表示,其可以應用在強化學習上或是做為場景理解的預訓練模型。在我的博士論文中有建構出一個通用的神經場景建構模型,建構出可擴展的場景記憶並實用於簡單場景的導航任務中。未來規劃將延伸該模型,在更大規模與複雜的場景上進行訓練,並優化記憶體的使用量與運算量,讓其可以運行在真實渲染的模擬環境或是實際場景中進行大規模的場景建構。
Scene reconstruction based on optimizing geometric objective has been studied in computer vision field for a long time. Compared to traditional geometric optimization models, neural rendering models can achieve better rendering results and construct scene representations that contain abstract semantic information. These representations can be used in reinforcement learning or as pre-training models for scene understanding. In my doctoral dissertation, I have constructed a general neural scene memory model that builds scalable scene memory and is applied to navigation tasks in simple scenes. In the future, I plan to optimize memory usage and computational efficiency of the model, and extend it to realistic synthetic environments or real-world scenarios for large-scale scene construction.
- Image-goal Navigation.
在基於影像的導航任務中,我們會以目標場景或物體的影像作為輸入,並透過動作決策將智能體導引導至目標影像的位置。相較於傳統導航演算法其依賴精確的幾何地圖與定位資訊,基於影像的導航演算法能適應精準度較低的地圖或是無地圖的情境,並對於人類的互動上更為直觀。我規劃使用神經場景表示的地圖或是建構儲存抽象特徵的拓樸地圖來協助基於影像的導航任務。
In image-goal navigation tasks, we utilize the images of the target scene or object as input and guide the intelligent agent towards the location of the target image through motion planning. In contrast to traditional navigation algorithms, which rely on precise geometric maps and localization information, image-goal navigation algorithms can adapt to lower precision maps or even map-free scenarios, and are more intuitive for human interaction. I plan to employ neural scene representations or construct topological maps that store abstract features to improve the performance for image-based navigation tasks.
- Cross-modal Learning for Language and Scene Representation.
隨著大尺度語言模型以及生成模型的發展,現今的模型已經能透過文字生成以假亂真的圖像。然而這些模型僅學會二維影像的分布而缺乏對於三維場景的空間關係,為了解決這個問題我們可以結合語言模型與場景表示模型來學習語言與空間特徵之間的關聯性。除了單純渲染出合理的場景之外,也可以學習語言中對於空間相關語句的概念或是不同人稱視角轉換的概念,比如說”在他前面”、”在我下面”,或是”A在B的左邊”等等的概念,這樣可以協助我們建構更為通用的基本模型。
With the development of large-scale language model and generative models, current models are capable of generating highly realistic images from text description. However, most of these models only learn the distribution of 2D images and lack an understanding of the spatial relationships in 3D scenes. To address this issue, we can combine language models with scene representation models to learn the correlation between language and spatial features. In addition to simply rendering plausible scenes, it is possible for the model to learn concepts of spatial relationships from different perspectives, e.g. "in front of someone", "on top of something", etc. This can help us construct more generalize foundation models.
- Spatial-aware World Model.
在基於模型的強化學習中會建構一個未來預測模型用來迭代採樣未來的狀態以進行動作規劃與價值的評估。近年來有許多模型針對狀態的時序變化進行建模,能夠進行影片的生成或是將整個遊戲內容進行模擬。而我預計想要結合三維空間相關的模型與時序預測模型來達到空間合理的觀察影像預測,並將其應用於空間相關的強化學習任務上面。
In model-based reinforcement learning, a future prediction model is constructed to iteratively sample future states for action planning and value estimation. In recent years, some researches learn to model the temporal changes of states, enabling tasks such as video generation or game simulation. I plan to combine 3D spatial models with temporal prediction models to achieve novel view prediction with spatial consistency, and the model can be applied to spatially related reinforcement learning tasks.
- Visual Commonsense Learning for Embodied AI.
具身智能專注於智能體與環境的互動,相較於蒐集大量資料進行訓練的離線學習,具身智能直接使用與環境互動的觀察進行學習。我預計結合環境探索與神經渲染模型,同時學習如何使用觀察影像來學習空間概念包含定位與建構場景記憶,與如何進行探索以收集有利於空間概念學習的資料。
Embodied intelligence focuses on the interaction between intelligent agents and the environment, and learns directly from observations instead of relying on offline learning from large datasets. I plan to combine environment exploration with neural rendering models to learn how to use observed images to learn spatial concepts such as localization and construct scene memory, and how to explore the environment to collect data useful for learning spatial concepts.
## 申請信函
系主任鈞鑒:
本人陳文正目前為清華大學資訊工程系博士後研究員,欲應徵貴系助理教授職缺。本人學術研究專長為機器學習與深度學習、強化學習、電腦視覺與圖形識別、機器導航與探索。於2023年取得博士學位後,即在清華大學資訊工程系擔任博士後研究員。至今共發表11篇論文著作(包含1篇國際期刊與10篇國際會議論文)。在博士班期間,曾經協助教授進行多門課程的教材與作業設計,並於2018年擔任成功大學製造研究所開設之應用深度視覺技術之智慧證照班的講師,也曾參與5項科技部與產學計畫。
很高興於貴系網站得知招聘公告,貴系在資通訊領域一直為學術界同仁所讚賞,而本人也因個人家庭與生涯規劃,希望能尋找北部的教職機會,深切期盼有機會加入貴系。附上本人的履歷表、著作列表與相關申請資料,盼您能給予我面試機會。
## 教學計劃
我主要研究方向為電腦視覺、機器導航與機器學習,因此在專長課程規劃上,預計從電腦視覺與機器學習的數學基礎開始,逐步延伸到基於視覺的導航演算法,再介紹較為進階的空間相關機器學習模型(如深度圖預測、基於深度學習的SLAM系統、神經渲染等等)。其中“數位影像處理”、“電腦視覺與影像辨識”、以及“機率機器學習”等三門課程目的為建立學生的基礎數學知識,而“機器導航與探索”與“三維空間深度學習”等課程則定位為研究導向課程,內容涵蓋此領域最新的研究成果,並輔以專題實作或是競賽的內容,培養學生從理論理解到進行實作的能力。
### 數位影像處理
#### Course Topics:
- Digital Image Fundamentals
- Intensity Transformations and Spatial Filtering
- Filtering in the Frequency Domain
- Image Restoration and Reconstruction
- Color Image Processing
- Wavelets and Multi-resolution Processing
- Image Compression
- Morphological Image Processing
- Image Segmentation
- Representation and Description
- Selected Problems in Digital Image Processing
#### Textbook or References:
- R.C. Gonzalez and R.E. Woods, Digital Image Processing, Prentice Hall,3rd edition, 2008.
#### Grading (subject to changes):
- Exam: 25%
- Assignments: 50%
- Final project: 25%
### 機率機器學習
#### Course Topics:
- Machine Learning Basics
- Probability and Bayesian Theory
- Linear Regression and Classification
- Support Vector Machine
- Clustering
- Dimension Reduction
- Probabilistic Graphical Models
- Deep Neural Network
- Deep Generative Model
#### Textbook or References:
- Pattern Recognition and Machine Learning. Christopher M.Bishop
#### Grading (subject to changes):
- Assignments: 50%
- Paper Presentation: 20%
- Final Project: 30%
### 機器導航與探索
#### Course Topics:
- Introduction of Robotic Navigation
- Kinematic Model
- Feedback Control of Path Tracking
- Path Planning
- SLAM Backend – Filter-based SLAM
- SLAM Backend – Graph-based SLAM
- SLAM Frontend – Lidar and Point Cloud
- SLAM Frontend - Multiview Geometry
- Modern SLAM
- 3D Deep Learning Model
- Reinforcement Learning
#### Textbook or References:
- Introduction to Autonomous Mobile Robots. 2nd Edition. Davide Scaramuzza, Roland Siegwart and Illah Reza Nourbakhsh
#### Grading (subject to changes):
- Assignments: 60%
- Competition: 40%
### 電腦視覺與影像辨識
#### Course Topics:
- Image Processing
- Feature Detection and Matching
- Camera Model and Calibration
- Geometric Transformation
- Structure from Motion
- Deep Learning for Computer Vision
- Image Classification
- Object Detection
- Segmentation
#### Textbook or References:
- Computer Vision: Algorithms and Applications. Richard Szeliski
#### Grading (subject to changes):
- Assignments: 50%
- Paper Presentation: 20%
- Final Project: 30%
### 三維空間深度學習
#### Course Topics:
- Deep Network and Generative Model
- 3D Scene Representation
- 3D Concolutional Network and Graph Network
- Point Cloud Classification, Detection and Segmentation
- Point Cloud Generation
- Noval View Synthesis
- Neural Scene Representation and Rendering
- Neural Radiance Field
- Depth and Optical Flow Estimation
- Pose Estimation and Deep Visual SLAM
#### Grades:
- Assignments: 50%
- Paper Presentation: 20%
- Final Project: 30%
## 推薦信
### 胡老師
本人為陳文正博士之博士班指導教授,與陳博士已相識逾八年,在指導其博士論文的五年期間,除了感受到他對於研究的熱情,也觀察到她在生活上圓融友善、負責積極的處事態度。文正畢業之後在成功大學資訊工程系任職,教學、研究與服務各方面所展現的優秀能力均令人印象深刻,因此,本人願意以其博士班指導教授的身份,向貴系極力推薦胡博士申請擔任專任教師一職。
胡博士攻讀學位期間,在本人所指導的研究小組中帶領學弟妹腦力激盪,共同研發出不少有趣且實用的多媒體系統,憑藉其認真、嚴謹與虛心之態度,以及獨立思考、合作統整之能力,敏君已於多媒體相關領域之知名國際會議、期刊與叢書上,發表多篇論文著作,成果豐碩值得肯定。目前本人也與胡博士共同執行科技部「數位經濟前瞻技術研發與應用專案計畫」,胡博士所帶領的研究團隊十分積極,對於研究資料的收集整理相當用心,不僅僅開發創新演算法,也花了不少心思在系統端的整合與介面設計,以確保研發成果不僅僅產出論文,還能進一步為業界所用。敏君愛好運動,希望能結合所學之資訊與運動知識,為運動迷開發實用的軟體工具,因此其研究主軸為運動影片語意內容分析與運動訓練虛擬實境系統開發,然而她並不將研究內容侷限於此,亦將其在電腦視覺、電腦圖學、多媒體檢索、虛擬實境與擴增實境之專長應用於開發智慧型監控影片分析、家庭影片分析、照片呈現、動作教學等系統。在研發的過程中,敏君充分展現其研究創意與熱情,並且獲得不錯的成果,讓我對她的資訊專業知識充滿信心。
胡博士曾多次舉辦或參與舉辦大型國際與國內多媒體領域重要會議,與國外學者交流頻繁,目前也與日本東京大學Hirose Tanikawa Narumi Lab有實際的研究合作。敏君曾執行過多項科技部/教育部計畫與產學合作計劃,教學方面曾榮獲兩次成大電資院優良教師,並協助成大規劃了不少跨領域課程與學程,即使面對繁雜的任務,她都能負責、有條不紊地完成,其無私為公的奉獻精神深獲大家的肯定。此外,敏君與大家的互動與合作關係十分良好,在討論研究計畫時,她會積極提出自己的意見,當遇到較具衝突性的學術或工作分配等爭端時,她總是能以團體利益為優先考量,同時也能體貼地為合作夥伴著想,展現出極佳的處事智慧,令本人極為讚賞!
基於上述各點,本人認為胡敏君博士為國內極具潛力的優秀研究人才,也必定會是極佳的工作合作夥伴,值得加以鼓勵與栽培,因此再次誠摯推薦其加入貴系所的優秀行列。