---
title: 'project_outline'
disqus: hackmd
---
Project Outline: Wheelock Video Analysis
===
### _Authors: Xinzhe Jian, Frank Pacini, Sichao Yu, Zhihui Zhan, Ariel Lee_
________
## Table of Contents
[TOC]
## Overview
#### 1. Situation and current issues
Large scale evaluation of classroom teaching is difficult and time consuming. Finding enough human evaluators to assess thousands of videos is infeasible. Machine learning can improve assessment efficiency and assist evaluators by preprocessing videos. The first step is to devise an algorithm to analyze the use of teaching time based on video data collected from classroom observations. Subsequently, we need to build a model that can generalize to classify “idle time” in unseen video footage (i.e. data not used during training or validation).
#### 2. Key Questions
* How can we define “idle” time?
* How much "idle time" for students exists within each video?
* Looking forward, how can we study “idle time” in relation to grade, subject, and teaching approach?
#### 3. Hypothesis (Approach)
* Clean and identify a subset of videos of the sample for analysis
* Work with WEPC to define "idle time"
* Create an approach (Speech to Text, Crowd Noise Detection, Action Recognition, etc) to video analysis that reliably identifies instances of "idle time" in a video
* Understand how calculations of ideal time track with other known variables about the classroom (e.g., subject area, % English Learners in the class, etc)
#### 4. Impact
An automatic approach to identify "idle time" will help improve the efficiency of teacher evaluations by providing an additional metric regarding the usage of classroom time. It will give an intuitive impression of the use of teaching time and reduce the workload of human evaluators. It could also be used for further assessments of classroom videos, such as understanding parts of teaching time which have been previously difficult to categorize, or comparing teaching methods based on "idle time" to promote more efficient teaching styles.
### [A] Problem Statement
#### 1. Goal
Create a machine learning approach that will detect and quantify “idle time” in video input using the [data set](https://hackmd.io/ib3HCOB4SKSVuzTj5s5Ogw#B-Checklist-for-project-completion).
#### 2. Approach
Develop a deep learning model to automatically detect “idle time” using audio and video data. We have started compiling a draft of research solutions in the [references](https://hackmd.io/ib3HCOB4SKSVuzTj5s5Ogw#B-Checklist-for-project-completion) section. This will be expanded upon as the project progresses. Additionally, we will be using the Shared Computing Cluster (SCC) @BU to store data and train models.
#### 3. Concerns
The final solution should not be used to evaluate instructors or students without direct human supervision and/or interaction. The existence of “idle time" does not always imply problems in teaching. For example, if an instructor is having technical difficulties with their computer, it’s possible our algorithm may classify that as “idle time". This is an ethical issue that developers and users should avoid when applying the solution to any classroom videos.
### [B] Checklist for project completion
- [ ] Detailed project execution plan
- [ ] EDA report
- [ ] Model prototype
- [ ] Model evaluation report
- [ ] Final model
### [C] From human action to an AI-automated solution
1. Human evaluator watches video from beginning to end.
2. If at any point the teacher and students stop talking for an extended period (5-10 secs), unless the teacher asks a question, the reviewer takes note of the time. Students sitting and doing nothing or moving around aimlessly would serve as confirmation to the evaluator that this is "idle time", rather than a solitary activity like computer use or reading.
3. Otherwise, if students are talking among themselves, the teacher isn’t speaking, and the teacher didn’t just start a classroom discussion, the evaluator also marks the time.
4. When the teacher starts talking again, the difference between the marked start time and this end time is the duration of "idle time".
5. Possible indicators of "idle time" are:
* No one speaking
* Student conversations not initiated by the teacher
> This will be expanded upon and clarified during the next stakeholder meeting
### [D] Path to operationalization
1. User uploads a teaching video in an internal evaluation portal.
2. Video is sent to model hosted on a server in a request.
3. Model outputs a series of timestamp pairs marking "idle time" along with a confidence score (probability) that the marked period is indeed "idle time".
4. Server returns marked times along with a sum of the "idle times" along with an "idle time" score for the video which accounts for the confidence scores.
5. Data is displayed to the user to aid evaluation.
Resources
---
### [A] Data Set
A library of more than 10,000 videos of classroom observations with corresponding information such as the number of students, subject, and grade.
### [B] References & Solutions
[Real-Time Facial Expression Recognition Using Deep Learning with Application in the Active Classroom Environment](https://www.mdpi.com/2079-9292/11/8/1240)
[Classroom Learning Status Assessment Based on Deep Learning](https://www.hindawi.com/journals/mpe/2022/7049458/)
[Lecture quality assessment based on the audience reactions using machine learning and neural networks](https://www.sciencedirect.com/science/article/pii/S2666920X21000163)
[Attentive or Not? Toward a Machine Learning Approach to Assessing Students’ Visible Engagement in Classroom Instruction](https://link.springer.com/article/10.1007/s10648-019-09514-z)
[Research on Intelligent Recognition Algorithm of College Students’ Classroom Behavior Based on Improved SSD](https://ieeexplore.ieee.org/abstract/document/9807756/keywords#keywords)
[Audio processing to Mel Spectograms](https://towardsdatascience.com/audio-deep-learning-made-simple-part-2-why-mel-spectrograms-perform-better-aad889a93505)
Lack of speech detection
1. AudioSet classification model, mark whenever not classes as speech for an extended period
2. Noise reduction [https://github.com/xiph/rnnoise] then detect by low sound level (this may be necessary for transcription anyway)
Crowd Detection: Need a model which can ingest the segmented audio track to detect crowd noise. Speed preferred over accuracy
1. Crowd class in AudioSet [https://research.google.com/audioset/dataset/crowd.html]
2. Models for AudioSet [https://modelzoo.co/model/audioset]
Class activity detection: Transcribe speech and then analyze (w/ traditional NLP) in an interval before periods of crowd noise
1. Speech to Text [https://arxiv.org/abs/2010.05171]
Weekly Meeting Updates
---
Please click on the link below for the most updated information:
> [Meeting Notes](https://docs.google.com/document/d/1eF3DEn9EGnES7aL1CD_TgQ7kzZLhxZJJojxhyuTer10/edit)