Internship Report

# Final Report: Camera Shift Detection Using Convolutional Neural Networks #### Sumedh Deshkar, Indian Institute of Technology Kharagpur ### Introduction: Over the course of the internship, I was tasked with developing a novel approach to detect camera shifts using CNNs. The goal was to create a robust and generalizable solution that could accurately identify instances when the camera had been moved. In the initial phase, I delved into the research paper titled "DiffPoseNet: Direct Differentiable Camera Pose Estimation" to gain insights into the state-of-the-art techniques in camera pose estimation. This paper served as a foundation for my work, highlighting the importance of robustness and generalization in camera pose estimation, and introducing the concept of optical flow. ### Month 1: Research and Paper Summary: * During the first month, I thoroughly studied the DiffPoseNet paper, which laid out key design goals, including robust and generalizable camera pose estimation, direct normal flow estimation using NFlowNet, and end-to-end learning of camera pose. The paper presented the TartanAir dataset for training and evaluation, as well as the architectural details of NFlowNet and PoseNet, the two main components of their approach. Additionally, the paper’s focus on different optimization layers and implementation details in Python 3.7 and PyTorch 1.9 provided essential guidance. Importantly, it emphasized the robustness and resilience of the DiffPoseNet approach compared to other methods, especially in the face of noise and varying datasets. * The main point of the paper which was useful for future purposes was the paper discussion about the comparison of different algorithms of optical flow such as NFlowNer, SelFlow, LiteFlowNet, and PWCNet. It was noticeable in the paper that NFlowNet produces the least errors among all and the paper actually proposes an approach with NFLownet. ### Month 2: Literature Review: * During the second month, I conducted an in-depth literature review to understand the existing approaches in camera pose estimation. I observed that geometric methods had been extensively explored, prompting us to focus on enhancing the performance of deep learning models for this task. Specifically, I analyzed models for both relative camera pose estimation (MeNet) and absolute camera pose estimation (PoseNet, MapNet). * Concurrently, I initiated data preparation, extracting 3D points from the video dataset using Colmap reconstruction and generating train, test, and validation datasets to support model development and evaluation. This month laid the foundation for our subsequent work in camera shift detection ![](https://hackmd.io/_uploads/HJKv5cZk6.png) ### Month 3: Optical Flow Exploration: * In the third month, guided by the goal of detecting camera shifts, I explored the concept of optical flow. One notable paper I encountered discussed the RAFT algorithm for dense optical flow estimation. It presented a method to extract features using CNNs with shared weights. * I studied the classical Lucas-Kanade method, highlighting the limitations of the optical flow in various situations like in weak texture images. * To create optical flow data, I processed video sequences obtained from the camera , generating ".npy" files containing images. During this phase, I also discovered a repository that focused on vehicle speed estimation using optical flow and deep learning, which inspired some aspects of our approach. ### Month 4: Optical Flow-Based Camera Shift Detection: In the fourth month, I devised a novel approach for camera shift detection based on optical flow data. The key insight was that when the camera shifts, substantial textures in the images exhibit specific u and v vectors in a particular direction. This phenomenon is especially pronounced due to the small time intervals between consecutive frames. I labeled the numpy files as '1' and '0' to indicate the presence and absence of camera shifts, respectively. Notably, even when objects moved within the field of view, this approach could still detect camera shifts, as long as these changes were not coincident with camera movement. ### Evaluation and Optimization: * I conducted evaluations of our camera shift detection approach using optical flow data from three different scenes: a parking area, a hostel corridor, and the IIT Kharagpur main building. The results demonstrated that our method effectively distinguished between optical flow patterns caused by camera shifts and those due to object motion. * Moreover, I referenced a research paper on BrigthFlow that addressed errors stemming from the assumption of constant brightness between frames, enhancing the accuracy of our optical flow calculations. ![](https://hackmd.io/_uploads/ByyIvqZka.png) ## Conclusion: * In conclusion, this research internship has led to the development of a camera shift detection approach that leverages optical flow and deep learning techniques. * This isnternship has provided valuable insights into the intersection of deep learning, computer vision and camera pose estimation, contributing to my understanding of how to tackle complex challenges in the field.