Video Understanding for Device Quality

# Video Understanding for Device Quality The final project is a testing framework to facilitate Embedded Device Product Quality with Video Understanding Techniques. # Abstract “Video Understanding” is an umbrella term that refers to the extraction and analysis of information from videos. A crucial task of “Video Understanding” is to recognize and localize (in space and time) different actions or events appearing in videos. Following are several well-known domains of “Video Understanding”.(Video Classification, Action Detection, Dense Captioning, Multiview and Multimodal[1][2], Activity Recognition, Action Forecasting) I would like to enhance the testing evaluation capabilities for the Google Pixel Mobile onto the domain of “Video Understanding”. The study and research insights that can be gained from better utilization of videos may enable more meaningfully structured issue reports, measurable & explainable video defects and automated video screening for a plethora embedded device enhancement applications. I proposes 3 areas of study for “Video Understanding for Device Quality”; 1. Video Quality Assessment as a key enhancement indicator for mobile device development 2. User screen video recording-based system integration reports for isolating the suggestive android operating system indicative logs. 3. Automated Video Analysis of critical user action sequences for smart device client experience improvements # Project Course Relevance I plan to combine this final project with my Independent study with Professor Longin Jan Latecki. Professor Latecki is currently advising my University Research with Google Taiwans Pixel team and there are a number of projects that I am currently taking on. Since having accomplished sound video classification based on the data Google gave me, I now need to automate the correlation of video streams to the underlying android operating system calls. This is to provide a better testing suite for the hardware engineers at Google Taiwan to improve on the Google Pixel device. There many additional enhancements for the entirety of the University research with Google but for the sake of scope management, the only task I envision to accomplish to the android log analysis with video streams. I will require to muster all the resourcefullness and problem solving skills I have gained through out the years and the vast python ecosystem of frameworks to undertake this study. # Deliverables 1. Research and Design an appropriate Video to android log(text) classification framework 2. Identify and differentiate the necessary hardware and software tools I would need to streamline a presentable MVP 3. Extract text-based Multiclass labels from the Android Perfetto[7], Winscope[8] and Logcat[6] command line loggings. First, a mature label extraction analysis needs to be conducted. A Multi-Modal architecture design can then be formulated. ## Good Outcome * Extracted log labels from android operating system * Use new labels to train Video classification model * Demonstrate basic classification of new log classes of Video Classification model * Show results in terms of graphs, tables and system architecture ## Better Outcome * Incorporate streamline of real time logs from Android device to Machine Learning model and demonstrate the performance of my Machine learning system through a terminal based interface * Demonstrate with Google Pixel Mobile, Laptop and defect simulation from the Pixel mobile ## Best Outcomes * End to end system call analysis of the Android operating system with added Video Understanding capabilities. # Resources & References [1] J. Summaira, X. Li, A. Shoib, S. Li and J. Abdul, "Recent Advances and Trends in Multimodal Deep Learning: A Review", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/2105.11087. [Accessed: 09- Aug- 2022]. [2] T. Baltrušaitis, C. Ahuja and L. Morency, "Multimodal Machine Learning: A Survey and Taxonomy", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/1705.09406. [Accessed: 09- Aug- 2022]. [3] R. Tao, E. Gavves, and A. W. Smeulders. Siamese instance search for tracking. arXiv preprint 1605.05863, 2016. 35, 37 [4] L. Wang, W. Ouyang, X. Wang, and H. Lu. Visual tracking with fully convolutional networks. In ICCV, 2015. 22, 35, 37, 40, 47 [5] H. Wu κ.ά., ‘FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling’, arXiv preprint arXiv:2207. 02595, 2022. [6] "Logcat command-line tool | Android Developers", Android Developers, 2022. [Online]. Available: https://developer.android.com/studio/command-line/logcat#Overview. [7]"Android Jank detection with FrameTimeline - Perfetto Tracing Docs", Perfetto, 2022. [Online]. Available: https://perfetto.dev/docs/data-sources/frametimeline. [8]"Tracing Window Transitions | Android Open Source Project", Android Open Source Project, 2022. [Online]. Available: https://source.android.com/docs/core/graphics/tracing-win-transitions. [9]Kotsiantis, S.B. et al. "Data Preprocessing For Supervised Learning". Citeseerx.Ist.Psu.Edu, 2021, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.8413&rep=rep1&type=pdf. [10]Akramullah, S. (2014). Video Quality Metrics. In: Digital Video Concepts, Methods, and Metrics. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4302-6713-3_4 [11] "GitHub - logpai/loghub: A large collection of system log datasets for log analysis research", GitHub, 2022. [Online]. Available: https://github.com/logpai/loghub. [12]S. He, J. Zhu, P. He and M. Lyu, "Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics", Arxiv.org, 2022. [Online]. Available: https://arxiv.org/pdf/2008.06448.pdf. [13]"LogPAI", Logpai.com, 2022. [Online]. Available: http://logpai.com/. [14]"2.7. Novelty and Outlier Detection", scikit-learn, 2022. [Online]. Available: https://scikit-learn.org/stable/modules/outlier_detection.html.