Peter's Thesis

# FPGA-Accelerated Reinforcement Learning Control for a Mecanum AMR on Kria KR260 ![Compressed-AMR](https://hackmd.io/_uploads/rkmepQukzg.png) ## :beginner: Thesis Info - Research Topic: FPGA-Accelerated Reinforcement Learning Control for a Mecanum AMR on Kria KR260 - Status: - [x] In progress - [ ] In review. Reviewer: - [ ] Finished ## :triangular_flag_on_post: Problem space Autonomous Mobile Robots (AMRs), especially mecanum-wheel platforms, require accurate state estimation and fast control inference to achieve stable trajectory tracking in real-time environments. However, several practical problems remain: 1. Raw sensor data from LiDAR, wheel encoders, IMU, and camera observations may be noisy or affected by environmental conditions, leading to inaccurate estimation of robot states such as position, orientation, and velocity. 2. Reinforcement Learning (RL) controllers may become unstable or less robust if the input state is not well estimated. 3. CPU-only inference may introduce latency and jitter, which can reduce real-time control quality. 4. The integration of heterogeneous computing resources (CPU + FPGA) on embedded robotic platforms such as Kria KR260 is still underexplored for AMR control. Therefore, this research aims to investigate whether EKF-based multi-sensor fusion, including camera-assisted observations, together with FPGA-accelerated RL inference can improve control performance, robustness, and real-time responsiveness for a mecanum AMR. ## 📈 Solution space - Idea 1: Use an Extended Kalman Filter (EKF) on the KR260 CPU to fuse LiDAR, encoder, IMU, and camera-derived observations for better state estimation. - Idea 2: Deploy a compact and quantized RL policy (e.g., PPO) on the FPGA fabric for low-latency real-time inference. - Idea 3: Build a CPU-FPGA co-design architecture where state estimation is executed on the CPU and control inference is executed on the FPGA. - Idea 4: Compare system performance when the RL controller receives raw state input versus EKF-fused state input. - Idea 5: Evaluate tracking accuracy, latency, jitter, and robustness under disturbances such as sensor noise or wheel slip. - Idea 6: Use a camera sensor (e.g., RealSense) to provide visual odometry or landmark-based pose correction for EKF fusion. ## :exclamation: Risks 1. **Risk:** FPGA deployment of RL policy may be difficult due to model compatibility, quantization loss, or toolchain complexity. **Hedge:** Start with a small neural network and use a simpler policy architecture that can be quantized and compiled more easily. 2. **Risk:** EKF tuning for mecanum robot kinematics and multi-sensor fusion may take significant time. **Hedge:** Begin with encoder + IMU fusion first, then integrate LiDAR-based and camera-based pose correction. 3. **Risk:** LiDAR and camera outputs may not be directly suitable as EKF inputs. **Hedge:** Use LiDAR through localization or scan-matching outputs, and use camera through visual odometry or landmark detection outputs, as pose observations for EKF updates. 4. **Risk:** The full system may become too large in scope for one thesis. **Hedge:** Limit the study to indoor 2D trajectory tracking and one RL policy only. ## :feet: Implementation The system is implemented on a mecanum AMR platform using the Kria KR260 as the main embedded computing board. The overall architecture separates perception, estimation, control, and actuation tasks: - LiDAR, wheel encoder, IMU, and camera provide real-time sensor data. - Each sensor first goes through a preprocessing stage to extract useful motion or localization information. - The CPU runs the EKF to fuse sensor measurements and estimate the robot state. - The estimated state is compared with the reference trajectory to compute tracking error. - The FPGA executes RL inference with low latency to generate motion commands. - The motion commands are converted into wheel-level commands for the mecanum drive. - The motor driver executes the commands and drives the AMR. - The robot motion is continuously fed back through the sensors, forming a closed-loop control system. ### Workflow The system starts by collecting data from the wheel encoder, IMU, LiDAR, and camera sensor. Each sensor first goes through a preprocessing stage to extract useful motion or localization information, such as wheel odometry, inertial measurements, LiDAR-based localization or scan matching, and camera-based visual observations such as visual odometry or landmark detection. These processed measurements are then fused by an Extended Kalman Filter (EKF) to estimate the robot state, including position, orientation, and velocity. The estimated state is compared with the reference trajectory to compute the tracking error. Based on this error and the estimated state, the controller or reinforcement learning policy generates motion commands. These commands are converted into wheel-level commands for the mecanum drive system, then executed by the motor driver. The robot motion is continuously fed back through the sensors, forming a closed-loop control system. ### :small_blue_diamond: Flow ``` mermaid flowchart LR A[Sensor Data Collection<br/>Encoder, IMU, LiDAR, Camera] --> B[Sensor Pre-processing] B --> C[EKF-based State Estimation] C --> D[Estimated Robot State] E[Reference Trajectory] --> F[Tracking Error Computation] D --> F F --> G[Controller / RL Policy] G --> H[Inverse Kinematics for Mecanum Wheels] H --> I[Motor Driver] I --> J[AMR Motion] J --> A ``` ### :small_blue_diamond: Specs | **Item** | **Specs** | **Note** | |:--------: |:---------: |:-------: | | Robot Platform | Mecanum AMR | 4-wheel omnidirectional mobile robot | | Main Board | Kria KR260 | Embedded CPU + FPGA platform | | Sensors | LiDAR, Encoder, IMU, Camera | Camera provides visual odometry or landmark-based correction | | State Estimation | Extended Kalman Filter (EKF) | Runs on CPU | | Control Method | Reinforcement Learning Policy | PPO preferred for first implementation | | RL Deployment | Quantized inference on FPGA | Focus on low latency and determinism | | Output | Body velocity or wheel speed reference | Final choice should be fixed during implementation | | Software Stack | ROS 2 + Ubuntu + FPGA toolchain | For communication and deployment | | Evaluation Metrics | Latency, jitter, tracking error, robustness | Compare raw state vs EKF-fused state | | Operating Environment | Indoor 2D trajectory tracking | Controlled test environment | ### :small_blue_diamond: Design ### System Architecture - **Perception Layer:** LiDAR, encoder, and IMU collect raw sensor data and provide motion or localization measurements after preprocessing. - **Camera Module:** The camera provides visual odometry or landmark-based pose correction. - **Estimation Layer (CPU):** EKF fuses LiDAR, encoder, IMU, and camera-derived observations to estimate \((x, y, \psi, v_x, v_y, \omega)\). - **Control Layer (FPGA):** RL policy inference receives the estimated state and tracking error, then generates control commands. - **Actuation Layer:** Inverse kinematics converts motion commands into wheel-level commands, and the motor driver actuates the mecanum wheels. - **Evaluation Layer:** The system logs trajectory tracking accuracy, latency, jitter, repeatability, and robustness. ### Layered Workflow ``` mermaid flowchart TD subgraph Perception Layer A[Encoder] B[IMU] C[LiDAR] E[Sensor Pre-processing] A --> E B --> E C --> E end subgraph Estimation Layer F[EKF-based State Estimation] G[Estimated State] E --> F F --> G end subgraph Control Layer H[Reference Trajectory] I[Tracking Error Computation] J[Controller / RL Policy] H --> I G --> I I --> J end subgraph Actuation Layer K[Inverse Kinematics] L[Motor Driver] M[Mecanum AMR Motion] J --> K K --> L L --> M end M --> E ``` ### Experimental Comparison Three main configurations can be compared: 1. CPU-only inference with raw state input 2. CPU-only inference with EKF-fused state input 3. FPGA inference with EKF-fused state input 4. With-camera versus without-camera EKF-based state estimation This design supports both algorithm-level and system-level analysis. ## 💬 Open Questions 1. Should the RL policy output body velocity commands \((v_x, v_y, \omega)\) or directly output wheel speed references? 2. How should LiDAR be integrated into EKF: directly, or through a localization/scan-matching module? 3. Should the camera be used for visual odometry, landmark detection, or both? 4. Is PPO sufficient, or should SAC also be tested as a second benchmark? 5. What level of quantization is acceptable without significantly degrading control quality? 6. Which performance metric should be emphasized most in the thesis: tracking accuracy, robustness, or latency? ### :small_blue_diamond: A checklist for stakeholders | Question | Answer | | ----------------------------------------------- |:------:| | 1. What is the result you want? | A real-time mecanum AMR control system using EKF-based multi-sensor state estimation and FPGA-accelerated RL inference on KR260. | | 2. Why is this result important? | It shows how CPU-FPGA co-design and camera-assisted multi-sensor fusion can improve AMR control performance and real-time capability. | | 3. How will you evaluate progress? | By measuring implementation completion, inference latency, trajectory tracking error, and robustness under disturbances, with and without camera-assisted observations. | | 4. How can you influence the result? | By optimizing EKF design, camera integration, RL model size, FPGA deployment pipeline, and experiment design. | | 5. Who is responsible for the results? | The researcher / thesis student. | | 6. How do you know you have achieved your goal? | When the AMR can follow trajectories stably and the camera-assisted FPGA-based system shows measurable benefit over the baseline. | | 7. How often will you review? | Weekly with advisor and after each implementation milestone. | --- ## Suggested Research Direction Statement This research focuses on developing a real-time control architecture for a mecanum Autonomous Mobile Robot (AMR) based on the Kria KR260 platform. The main idea is to combine EKF-based multi-sensor state estimation on the CPU using LiDAR, wheel encoder, IMU, and camera-derived observations, with FPGA-accelerated Reinforcement Learning inference for motion control. By comparing raw-state control and EKF-fused-state control, as well as configurations with and without camera-assisted observations, the study aims to evaluate whether improved state quality and heterogeneous CPU-FPGA task partitioning can enhance trajectory tracking accuracy, robustness, and real-time performance.