# Real-Time Monitoring of Door Status in Public Transit Systems
## Abstract
Automatic Passenger Counters (APCs) are electronic devices installed on vehicles, such as buses and trains, to record the times and locations where passengers board and alight. This data is crucial for analyzing travel patterns and improving the operational efficiency of transportation services. However, integrating APC systems into older public transit vehicles can be challenging because accessing and correctly connecting the necessary wiring is often difficult. In our project, YOLOv8 is utilized to determine the status of the doors. By leveraging computer vision techniques, the detection of whether the doors are open or closed can be performed. This approach not only enhances the accuracy of passenger counting but also reduces the complexity of integrating new systems into older vehicles. By minimizing the need for extensive hardware modifications, a cost-effective and efficient method for upgrading public transit systems is offered by our solution.
## 1. Introduction
Public transit systems play a critical role in urban mobility, facilitating efficient transportation for vast numbers of passengers every day. One key technology enhancing the operational efficiency of these systems is the Automated Passenger Counter (APC), an electronic device that records the times and locations of passengers boarding and disembarking from transit vehicles like buses and trains. Accurate data from APCs are indispensable for analyzing travel patterns and optimizing transit service delivery. Moreover, these data contribute to resource allocation, ensuring that transit authorities can manage vehicle frequencies and capacities based on demand. However, the effectiveness of APCs hinges crucially on real-time, accurate detection of vehicle door statuses—open or closed—as these signals trigger the start and end of the passenger counting process.
Integrating APC systems with the door status mechanisms of older public transit vehicles presents significant challenges. Many of these older vehicles lack the modern infrastructure necessary for straightforward integration, often requiring costly and disruptive retrofitting to install the necessary wiring and sensors. Such challenges highlight the need for an innovative solution that bypasses these issues, leading to the exploration of vision-based automatic door status monitoring technologies. These technologies aim to eliminate the reliance on physical wiring by using video analysis to determine door statuses, offering a non-intrusive and flexible solution suitable for both contemporary and legacy transit systems.
Developing a vision-based door status monitoring system involves overcoming several substantial challenges, chiefly the need for generalized door detection and precise localization through video feeds. The system must accurately differentiate and categorize the various door statuses—Closed, Opening, Open, and Closing—critical for counting passengers accurately and ensuring their safety during transit operations. The categorization process must be swift and robust, providing real-time feedback to transit operators and integrated systems.
However, implementing such a system is not without its obstacles. The system must demonstrate resilience against numerous potential interferences that could impair its accuracy. These include occlusions from passengers moving in front of the cameras, variable lighting conditions ranging from strong daylight to dim night scenes, and physical vibrations from the vehicle that could destabilize video output. Additionally, reflections from glass surfaces within the vehicle can create false images and distort the camera's view, further complicating the task of accurate door status detection. Moreover, the system must be robust enough to handle the wide variety of door designs and configurations found in different transit vehicles, which include single-leaf, double-leaf, sliding, and folding doors.
To address these challenges, this project employs advanced object detection algorithms, specifically utilizing the YOLOv8 model adapted for real-time video analysis. The system is enhanced with data augmentation techniques to handle diverse lighting and environmental conditions and includes a steady state averaging filter to mitigate transient fluctuations in detection output. This comprehensive approach ensures that the door status monitoring system is not only effective across various operational scenarios but also robust enough to handle the complexities of real-world transit environments. This project thus promises to significantly advance the capabilities of public transit management systems, leading to better service delivery and enhanced passenger safety.
## 2. Method
In this section, we will present our proposed method, divided into two major parts: door status detection and a steady-state averaging filter. At the end of this paragraph, we will outline the overall architecture of our reconstruction system.

### 2.1 Door Status Detection
The project employs the YOLOv8 model, a robust deep learning framework known for its effectiveness in object detection tasks. Video frames from transit system cameras are input into the model, which has been trained to classify each frame as either ‘Open’ or ‘Close’. This classification is based on visual cues learned during the model training phase, which includes diverse scenarios of door operations under various lighting and crowd conditions.
#### 2.1.1 Data Augmentation
In the project, three sample videos are provided and the ```cv2.VideoCapture``` module is utilized to extract frames from these videos. These video frames serve as the data source for training the door status detection model.
To enhance the model’s robustness and its ability to generalize across various lighting conditions, camera angles, and door designs, the following data augmentation techniques are utilized during training:
- **HSV Color Space Adjustment (hsv\_h=0.05)**: This technique adjusts the hue, saturation, and value of the images by a small amount to simulate different lighting conditions and color variations that might occur in real-world scenarios.
- **Horizontal and Vertical Flips (fliplr=0.5 and flipud=0.5)**: These augmentations randomly flip the images horizontally and vertically, enhancing the model's ability to recognize doors in various orientations. This is particularly crucial given the diversity of camera setups and angles in different transit systems.
- **Perspective Transformation (perspective=0.01)**: Minor changes in perspective are applied to the images to mimic the effect of different camera angles, helping the model to maintain accuracy even when the camera is not positioned optimally.
- **Dropout (dropout=0.3)**: Randomly omitting part of the feature detectors during training helps prevent the model from becoming too dependent on any specific aspect of the training data, enhancing its ability to generalize.
- **CLAHE (clipLimit=2.0, tileGridSize=(4,4))**: The Contrast Limited Adaptive Histogram Equalization (CLAHE) enhances the contrast of images by applying histogram equalization in small regions (tiles) of the image. By limiting the contrast amplification in homogeneous areas, CLAHE prevents over-amplification of noise and helps in highlighting features in images under varying lighting conditions.
- **Fish-Eye Reverse**: The fish-eye reverse simulates extreme distortion conditions, making the model more accurate and resilient when dealing with such distortions. By simulating extreme visual conditions, it improves the model's feature extraction and resilience to distortion.
<!--  -->
<!--  -->

#### 2.1.2 YOLOv8 Classification Model
YOLOv8 has five pretrained classification models, each designed to balance various aspects of performance and efficiency. Considering the trade-off between hardware limitations and speed efficiency, YOLOv8s-cls is selected as the foundation for further training.
By choosing YOLOv8s-cls, the training process benefits from the model's lightweight architecture, which reduces the demand on hardware resources without significantly compromising accuracy. This selection is particularly advantageous for applications requiring real-time processing, where speed is crucial. The model's smaller size and faster processing capabilities allow for quicker iteration and testing, enabling more efficient development cycles.
Furthermore, the pretrained nature of YOLOv8s-cls provides a solid starting point, as it comes with weights already fine-tuned on large datasets. This pretraining helps accelerate the convergence during the training phase, as the model starts from a state of reasonably good performance rather than from scratch. Consequently, fewer training epochs are needed, which further reduces the computational load and speeds up the overall training process. The hyperparameters are listed in the Table:
| Batch size | Image Size | Epoch | Dropout | Patience |
|:----------:|:----------:|:-----:|:-------:|:--------:|
| 8 | 960 | 30 | 0.2 | 5 |
#### 2.1.3 Model Evaluation and Validation
After the classification model has been trained, it is crucial to evaluate and validate its performance to ensure it meets the desired standards and performs well in real-world scenarios. Several metrics are considered to validate the model’s performance:
- **Accuracy**: This primary metric indicates the percentage of correctly classified frames. It provides a general overview of the model’s effectiveness in distinguishing between different classes. However, accuracy alone may not be sufficient, especially in cases of imbalanced datasets where certain classes are underrepresented.
- **Precision and Recall**: These metrics are crucial for understanding the model’s performance in terms of false positives and false negatives, respectively.
- **Precision** is defined as the ratio of true positive predictions to the sum of true positive and false positive predictions. In other words, Precision measures how many of the cases identified by the model as positive are truly positive.
- **Recall**, on the other hand, is the ratio of true positive predictions to the sum of true positive and false negative predictions. In other words, Recall measures how many of the cases that should have been found are actually found by the model.
### 2.2 Steady State Averaging Filter
To enhance the reliability of the door status detection, a steady state averaging filter is applied to the output sequence of the YOLOv8 model. This filter addresses the issue of transient fluctuations in the model’s output by averaging the detected statuses over a predefined number of frames. By implementing this filter, the system can mitigate short-lived anomalies in detection, such as those caused by abrupt changes in lighting or partial obstructions.
The steady state averaging filter works by taking the status detected in each frame and computing a rolling average over a set window of frames. This approach smooths out rapid, short-lived changes that might otherwise lead to incorrect status detections. For example, if a sudden shadow or a brief occlusion of the door occurs, these transient events would have a diminished effect on the overall detected status due to the averaging process.
The implementation of this steady state averaging filter involves several key steps. Initially, the detected statuses from each frame are converted into numerical values for easier manipulation: ```Open``` is mapped to 1, ```Close``` is mapped to 0, and any ambiguous states is mapped to None. A rolling average is then computed over these numerical values using a specified window size. The window size determines the number of frames over which the average is calculated, balancing between responsiveness and stability.

## 3. Result
#### Model Performance Evaluation
The performance of the YOLOv8 model in detecting and classifying door statuses in public transit systems was rigorously evaluated through a series of training and validation processes. The effectiveness of the model can be observed in two key results: the confusion matrix and the loss metrics during training.
#### Confusion Matrix Analysis
The confusion matrix provides a visual and quantitative representation of the model's performance across two classes: 'Open' and 'Close'. As illustrated in the provided confusion matrix:
- **True Positive (Close)**: The model demonstrates high accuracy for the 'Close' status, with a prediction accuracy of 1.00. This indicates that nearly all 'Close' states were correctly identified, with no false negatives.
- **True Negative (Open)**: For the 'Open' status, the model also shows exemplary performance with a predictive accuracy of 0.99. This suggests that the model is highly capable of recognizing open doors, missing very few instances.
- **False Positives**: The occurrence of false positives is minimal, as indicated by the 0.01 probability in predicting 'Open' when the door is actually 'Close'.
- **False Negatives**: Similarly, false negatives are quite low, with only 0.01 of 'Close' predictions occurring when the door status is 'Open'.
This confusion matrix underscores the model’s robustness in distinguishing between open and closed states, which is crucial for ensuring that the APC system records accurate passenger counts.

#### Parameters Comparison
| | CLAHE | Fish-eye reverse | Grayscale | Combined with 50% probability | Combined with 70% probability|
| --- |:-----:|:----------------:|:---------:|:------------------------------------------------------:|:-------------------------------------------------------:|
| **Acc.** | 40% | 40% | 40% | 60% | 60% |
This section depicts various image preprocessing methods and their impacts on model performance, including CLAHE, fish-eye reverse, and grayscale. Each method individually yields a 40% success rate for recognizing case 1, 7. The combination of all effects with each having a 50% probability results in a 60% success rate for recognizing case 1, 3, and7, while a 70% probability combination leads to successful detection of case 5's door opening time, which is the highest performance observed across all models. Consequently, the final choice was to utilize the combination of all effects, each with a 70% probability, as this configuration yielded the best results.
<!-- The training and validation loss graphs present the model’s learning progress over five epochs. Both graphs display a steep decline in loss from the initial to the final epochs, indicating effective learning and convergence.
- **Training Loss**: Started at approximately 0.06 and steadily decreased to below 0.01 by the end of the training process. This sharp decline demonstrates the model's ability to adapt to the training data, optimizing its parameters efficiently to reduce prediction error.
- **Validation Loss**: The validation loss began around 0.105 and mirrored the training loss by dropping significantly, stabilizing just above 0.08. The closeness of training and validation loss values towards the end of the epochs suggests that the model is not overfitting and generalizes well to unseen data. -->
<!--  -->
<!-- 
-->
## 4. Discussion
In the application of this system, the integration of YOLOv8 for frame classification and the steady state averaging filter for output stabilization offers a comprehensive approach to door status monitoring. The system’s performance has been evaluated under various operational conditions, demonstrating a high degree of accuracy in real-time door status detection.
Challenges encountered during the project include:
- Model Overfitting: Initial phases of the project showed tendencies of the model to overfit to training data, which was mitigated by augmenting the dataset and applying regularization techniques.
- Real-Time Processing Requirements: Ensuring that the video frame processing and status detection occur in real-time presented technical challenges, particularly in maintaining low latency and high throughput under all traffic conditions.
## 5. Conclusion
In the discussion section, we analyze the strengths and limitations of our proposed method and provide insights into the obtained results. We also discuss potential improvements and future directions to further enhance the system's performance.
### Proposed Method
The integration of advanced object detection models with post-processing filters allows the proposed system to accurately and reliably monitor door statuses in a real-time setting. The system’s ability to adapt to different environmental conditions without significant degradation in performance is a testament to its robustness.
### Limitations and Challenges
Despite its effectiveness, the system’s performance is heavily dependent on the quality of the input video stream. Low-resolution or highly compressed video can degrade the model’s accuracy. Additionally, the system’s complexity and computational requirements pose challenges for deployment on platforms with limited processing capabilities.
### Future Work
To further enhance the system, future work could focus on:
- Expanding the Training Dataset: Including more diverse scenarios in the training dataset to improve the model’s generalizability.
- Optimizing Computational Efficiency: Researching ways to reduce the model’s computational demands to facilitate easier deployment on a wider range of hardware platforms.