# R.S. Research Journal - Nemodrive <style>.markdown-body { max-width: 1250px; }</style> <h2> The results below are for less overlap between train & validation </h2> <h3> Week 11 </h3> <h5> Baseline closed-loop</h5> Experiment: nvidia_speed Autonomy = 0.45 Interventions = 620 Mean Distance = 0.36, Std Distance = 0.40 Mean Angle = 3.10, Std Angle = 3.81 Videos Time = 3060.16 Videos Time + Time Penalty = 6780.16 <h5> Baseline open-loop</h5> 'experiment': 'Baseline', 'sum': 16614.705320670964, 'mean': 1.7971557945560805, 'std': 2.504976906561338, 'min': 0.035435732658413766, 'max': 6.874609042644238, 'median': 0.13864586540055066 <h5> Closed loop evaluation for the new model using new augmentations (TODO) </h5> | Model | Speed | Depth| Disp | AUTONOMY | NUM_INTERV | MEAN DIST | MEAN ANGLE | VIDEOS LENGTH | VIDEOS_LENGTH + PENALTY | | ------ | ----- | --- | ---- | -------- | ---------- | --------- | ---------- | ------------- | ----------------------- | | RESNET | YES | NO | NO | 0.70 | 218 | 0.33 | 2.10 | 3060.16 | 4368.16 | | RESNET | YES | YES | NO | 0.71 | 208 | 0.32 | 2.13 | 3060.16 | 4308.16 | | RESNET | YES | NO | YES | 0.70 | 216 | 0.33 | 2.03 | 3060.16 | 4356.16 | <h5> Open loop evaluation for new model with depth map </h5> | Model | Speed | Stacked | Depth(aux) | Flow(aux) | MEAN | STD | MEDIAN | MIN | MAX | Sample size | | ------ | ----- | --- | ---------- | --------- | ---- | ---- | ------ | ---- | ---- | ----------- | | SIMPLE | YES | NO | NO | NO | 0.755| 1.008| 0.361 | 0.003|14.118 | 9083 | | RESNET | YES | NO | NO | NO | 0.711| 1.304| 0.175 | 0.001|11.217 | 9083 | | SIMPLE | YES | NO | YES | NO | 0.749| 1.054| 0.333 | 0.002|18.529 | 9083 | | RESNET | YES | NO | YES | NO | 0.725| 1.320| 0.176 | 0.001|12.308 | 9083 | | SIMPLE | YES | NO | YES | YES | 0.445| 0.842| 0.147 | 0.000| 15.915| 9083 | | RESNET | YES | NO | YES | YES | 0.432| 0.898| 0.0999 | 0.001|12.570 | 9083 | | Model | Speed | Stacked | Depth(aux) | Flow(aux) | MEAN | STD | MEDIAN | MIN | MAX | Sample size | | ------ | ----- | --- | ---------- | --------- | ---- | ---- | ------ | ---- | ---- | ----------- | | SIMPLE | NO | YES | NO | NO | 0.470| 0.884| 0.170 | 0.001|15.600| 9083 | | RESNET | NO | YES | NO | NO | 0.430| 0.945| 0.090 | 0.002|12.256| 9083 | | SIMPLE | NO | YES | YES | NO | 0.500| 1.005| 0.152 | 0.000|19.458| 9083 | | RESNET | NO | YES | YES | NO | 0.432| 0.937| 0.091 | 0.001|13.003| 9083 | | SIMPLE | NO | YES | YES | YES | 0.428| 0.834| 0.141 | 0.001|18.507| 9083 | | RESNET | NO | YES | YES | YES | 0.428| 0.847| 0.121 | 0.001|10.236| 9083 | <h5> Open loop evaluation for new model with disparity map (TODO) </h5> | Model | Speed | Stacked | Disp(aux) | Flow(aux) | MEAN | STD | MEDIAN | MIN | MAX | Sample size | | ------ | ----- | --- | ---------- | --------- | ---- | ---- | ------ | ---- | ---- | ----------- | | SIMPLE | YES | NO | NO | NO | 0.755| 1.008| 0.361 | 0.003|14.118 | 9083 | | RESNET | YES | NO | NO | NO | 0.711| 1.304| 0.175 | 0.001|11.217 | 9083 | | SIMPLE | YES | NO | YES | NO | 0.755| 0.996| 0.385 | 0.002|15.949 | 9083 | | RESNET | YES | NO | YES | NO | 0.684| 1.266| 0.167 | 0.001|13.099 | 9083 | | SIMPLE | YES | NO | YES | YES | 0.443| 0.859| 0.139 | 0.001|16.358 | 9083 | | RESNET | YES | NO | YES | YES | 0.426| 0.870| 0.107 | 0.000|10.500 | 9083 | | Model | Speed | Stacked | Disp(aux) | Flow(aux) | MEAN | STD | MEDIAN | MIN | MAX | Sample size | | ------ | ----- | --- | ---------- | --------- | ---- | ---- | ------ | ---- | ---- | ----------- | | SIMPLE | NO | YES | NO | NO | 0.470| 0.884| 0.170 | 0.001|15.600| 9083 | | RESNET | NO | YES | NO | NO | 0.430| 0.945| 0.090 | 0.002|12.256| 9083 | | SIMPLE | NO | YES | YES | NO | 0.503| 1.036| 0.147 | 0.000|18.207| 9083 | | RESNET | NO | YES | YES | NO | 0.418| 0.921| 0.096 | 0.000|12.379| 9083 | | SIMPLE | NO | YES | YES | YES | 0.457| 0.820| 0.178 | 0.001|15.430| 9083 | | RESNET | NO | YES | YES | YES | 0.435| 0.953| 0.084 | 0.000|11.929| 9083 | <h5>Close loop evaluation for the old model with the new augmentations</h5> Experiment: resnet_speed_augm Autonomy = 0.73 Interventions = 187 Mean Distance = 0.27, Std Distance = 0.28 Mean Angle = 2.03, Std Angle = 2.76 Videos Time = 3060.16 Videos Time + Time Penalty = 4182.16 <h5>Close loop evaluation for the old model with old augmentations</h5> | Model | Speed | Augm | AUTONOMY | NUM_INTERV | MEAN DIST | MEAN ANGLE | VIDEOS LENGTH | VIDEOS_LENGTH + PENALTY | | ------ | ----- | ---- | -------- | ---------- | --------- | --- | ------------- | ----------------------- | | SIMPLE | YES | NO | 0.66 | 267 | 0.36 | 2.13 | 3060.16 | 4662.16 | | RESNET | YES | NO | 0.67 | 248 | 0.36 | 2.02 | 3060.16 | 4548.16 | | SIMPLE | YES | YES | 0.76 | 165 | 0.22 | 1.79 | 3060.16 | 4050.16 | | RESNET | YES | YES | 0.80 | 128 | 0.21 | 1.71 | 3060.16 | 3828.16 | <!-- | Model | Speed | Augm | AUTONOMY | NUM_INTERV | MEAN DIST | MEAN ANGLE | VIDEOS LENGTH | VIDEOS_LENGTH + PENALTY | | ------ | ----- | ---- | -------- | ---------- | --------- | --- | ------------- | ----------------------- | | SIMPLE | YES | NO | 0.66 | 264 | 0.38 | 2.25 | 3060.16 | 4644.16 | | RESNET | YES | NO | 0.67 | 250 | 0.37 | 2.12 | 3060.16 | 4560.16 | | SIMPLE | YES | YES | 0.73 | 189 | 0.22 | 2.06 | 3060.16 | 4194.16 | | RESNET | YES | YES | 0.78 | 144 | 0.21 | 2.01 | 3060.16 | 3924.16 | --> <h5>Open loop evaluation for the old model </h5> | Model | Speed | Augm | MEAN | STD | MEDIAN | MIN | MAX | Sample size | | ------ | ----- | --- | ---- | ---- | ------ | ---- | ---- | ----------- | | SIMPLE | YES | NO | 0.673| 0.933| 0.302 | 0.000|10.877| 9245 | | RESNET | YES | NO | 0.645| 1.053| 0.200 | 0.001|10.595| 9245 | | SIMPLE | YES | YES | 0.704| 0.993| 0.331 | 0.000|19.764| 9245 | | RESNET | YES | YES | 0.685| 1.027| 0.311 | 0.001|12.665| 9245 | <h5> Intervention points for resnet + speed + augm (TODO) </h5> <!-- ![](https://i.imgur.com/V11bmef.jpg) --> ![](https://i.imgur.com/UFKMTFG.jpg) <h5> Train/Test split</h5> Train: ![](https://i.imgur.com/KPjt6Of.jpg) Test: ![](https://i.imgur.com/4BWpXHp.jpg) --- <h2> The results below are for a random split</h2> <h3> Week 10 </h3> <h5>Open loop evaluation for the old model </h5> The perspective augmentation used for training assumes that the ground is on a plane at y=0 | Model | Speed | Augm | MEAN | STD | MEDIAN | MIN | MAX | Sample size | | ------ | ----- | --- | ---- | ---- | ------ | ---- | ---- | ----------- | | SIMPLE | YES | NO | 0.564| 1.037| 0.155 | 0.000|13.831| 8741 | | RESNET | YES | NO | 0.577| 1.115| 0.146 | 0.001|20.350| 8741 | | SIMPLE | YES | YES | 0.589| 1.339| 0.146 | 0.000|20.350| 8741 | | RESNET | YES | YES | 0.609| 1.012| 0.210 | 0.002|13.232| 8741 | <h5>Close loop evaluation for the old model </h5> The perspective augmentation used for testing assumes that the ground is on a plane at y=0 (the same agumentation as for trainig) | Model | Speed | Augm | AUTONOMY | NUM_INTERV | MEAN DIST | MEAN ANGLE | VIDEOS LENGTH | VIDEOS_LENGTH + PENALTY | | ------ | ----- | ---- | -------- | ---------- | --------- | --- | ------------- | ----------------------- | | SIMPLE | YES | NO | 0.72 | 193 | 0.30 | 1.91 | 2946.26 | 4104.26 | | RESNET | YES | NO | 0.73 | 182 | 0.30 | 1.88 | 2946.26 | 4038.26 | | SIMPLE | YES | YES | 0.79 | 129 | 0.16 | 1.78 | 2946.26 | 3720.26 | | RESNET | YES | YES | 0.80 | 122 | 0.21 | 1.92 | 2946.26 | 3678.26 | <h5> Close loop evalution for resnet with augm </h5> The perspective augmentation used for testing use the depth map (the augmentation is different than training) Experiment: resnet_speed_augm Autonomy = 0.78 Interventions = 140 Mean Distance = 0.26, Std Distance = 0.28 Mean Angle = 1.94, Std Angle = 2.59 Videos Time = 2946.26 Videos Time + Time Penalty = 3786.26 Experiment: Baseline Autonomy = 0.42 Interventions = 676 Mean Distance = 0.37, Std Distance = 0.42 Mean Angle = 3.38, Std Angle = 3.99 Videos Time = 2946.26 Videos Time + Time Penalty = 7002.26 <h3> Week 9 </h3> <h5> RESIDUAL Optical Flow </h5> The pose estimation from monodepth captures the movement of the camera and implicitly the movement of static objects (buildings, poles etc.). This apporach does not caputre the motion of dynamic objects in which we are mostlty interested in (cars, people etc.) A network is trained to generate the RESIDUAL optical flow. The final flow is idefined as $$ Flow = Flow_{rigid} + Flow_{residual} $$ The model in trained using a photometric recostruction loss: $$ 0.85 \times SSIM + 0.15 \times L_{1} $$ Results after the correction: ![](https://i.imgur.com/VcybhXa.png) ![](https://i.imgur.com/JgrD1W3.png) ![](https://i.imgur.com/jonJzYa.png) ![](https://i.imgur.com/3tVdlgM.png) --- ![](https://i.imgur.com/7RLgv27.png) ![](https://i.imgur.com/PPezut3.png) ![](https://i.imgur.com/a7ht9jv.png) ![](https://i.imgur.com/J3fERrV.png) Due to the photometric reconstruction loss, there are some problems in low light(car lights) ![](https://i.imgur.com/gtK8f8C.png) ![](https://i.imgur.com/AVKunN1.png) <h5> Steering Open-loop evaluation</h5> | Model | Speed | Stacked | Depth(aux) | Flow(aux) | MEAN | STD | MEDIAN | MIN | MAX | Sample size | | ------ | ----- | --- | ---------- | --------- | ---- | ---- | ------ | ---- | ---- | ----------- | | SIMPLE | YES | NO | NO | NO | 0.624| 1.190| 0.196 | 0.003|17.186| 8741 | | RESNET | YES | NO | NO | NO | 0.628| 1.061| 0.219 | 0.007|11.703| 8741 | | SIMPLE | YES | NO | YES | NO | 0.607| 1.164| 0.183 | 0.002|14.366| 8741 | | RESNET | YES | NO | YES | NO | 0.658| 1.384| 0.116 | 0.001|13.572| 8741 | | SIMPLE | YES | NO | YES | YES | 0.407| 0.871| 0.108 | 0.000|12.140| 8741 | | RESNET | YES | NO | YES | YES | 0.398| 0.908| 0.070 | 0.001|11.795| 8741 | | Model | Speed | Stacked | Depth(aux) | Flow(aux) | MEAN | STD | MEDIAN | MIN | MAX | Sample size | | ------ | ----- | --- | ---------- | --------- | ---- | ---- | ------ | ---- | ---- | ----------- | | SIMPLE | NO | YES | NO | NO | 0.410| 0.853| 0.106 | 0.000|10.980| 8741 | | RESNET | NO | YES | NO | NO | 0.420| 0.877| 0.102 | 0.002|11.646| 8741 | | SIMPLE | NO | YES | YES | NO | 0.456| 0.837| 0.162 | 0.001|11.502| 8741 | | RESNET | NO | YES | YES | NO | 0.430| 0.962| 0.085 | 0.001|11.438| 8741 | | SIMPLE | NO | YES | YES | YES | 0.398| 0.882| 0.100 | 0.000|11.843| 8741 | | RESNET | NO | YES | YES | YES | 0.403| 0.823| 0.112 | 0.002|11.408| 8741 | <h5> Partial close loop evaluation of RESNET + SPEED + DEPTH </h5> The model was picked considering the lowest median values. There are some videos in which the grund truth value is wrong affecting the mean. Statistics show that the distribution is squeed. (I have to plot it!) Experiment: resnet_rgb_speed_depth_balance Autonomy = 0.75 Interventions = 163 Mean Distance = 0.29, Std Distance = 0.33 Mean Angle = 2.00, Std Angle = 2.65 Videos Time = 2946.26 Videos Time + Time Penalty = 3924.26 Experiment: resnet_rgb_speed_balance Autonomy = 0.73 Interventions = 185 Mean Distance = 0.35, Std Distance = 0.35 Mean Angle = 2.18, Std Angle = 2.75 Videos Time = 2946.26 Videos Time + Time Penalty = 4056.26 <h6>Some samples of individual results (27 videos) </h6> Experiment: resnet_rgb_speed_depth_balance | Video | Autonomy | Num. interv | Video length | | ---------------- | -------- | ----------- | ------------ | | eb21da8af9354337 | 0.54 | 5 | 36.0 | | 59ce46ec60254518 | 0.60 | 4 | 36.0 | | ee33b6e2499e4e9e | 0.64 | 6 | 66.0 | | 3d05d3d3f3d5419d | 0.66 | 3 | 36.0 | | 62cf50ef91dc4fe3 | 0.66 | 3 | 36.0 | | 4e9b6559753d4125 | 0.66 | 3 | 36.0 | | 3bf0bc84ada845d0 | 0.66 | 3 | 36.0 | | ccc46f137b8044eb | 0.66 | 3 | 36.0 | | 8d9913473dc44e63 | 0.66 | 3 | 36.0 | | 750040ae79da4d65 | 0.75 | 2 | 36.0 | | 7c53a433656b4f9f | 0.75 | 2 | 36.0 | | 7c53a433656b4f9f | 0.75 | 2 | 36.0 | | a52f549f983d4d43 | 0.78 | 3 | 64.0 | | 947d8caf831b4bff | 0.78 | 3 | 66.4 | | 1c820d64b4af4c85 | 0.85 | 1 | 36.0 | | 687a5dee0e2947a5 | 0.85 | 1 | 36.0 | | 631e81945f814203 | 0.85 | 1 | 36.0 | | 0f7983aee856498c | 0.85 | 1 | 36.0 | | c1a7e181ce2946b9 | 0.85 | 1 | 36.0 | | 98b0935d5af84f66 | 0.85 | 1 | 36.0 | | 68c845f010764969 | 0.85 | 1 | 36.0 | | 21360f50b5ec46fc | 1.00 | 0 | 36.0 | | 5b8e285bb95f4634 | 1.00 | 0 | 36.0 | | 708b34da06f94554 | 1.00 | 0 | 36.0 | | 41b3d047e5544ca0 | 1.00 | 0 | 36.0 | | 4cc9797ac787401c | 1.00 | 0 | 36.0 | | 0ef581bf4a424ef1 | 1.00 | 0 | 36.0 | [All results here for resnet_rgb_speed_depth_balance ](https://drive.google.com/drive/folders/1ZKP2OuCD1JR-fYA2KNe3S0vERNz4GfT_?usp=sharing) <h5> Intervention Points </h5> Most of the intervention points are located at intersections. ![](https://i.imgur.com/Kjau1SC.jpg) <h3> Week 7 & 8</h3> <h5> RIGID Optical Flow</h5> [GIF samples](https://drive.google.com/drive/folders/1wG-c6JM6Fvk6bb6W0odXqvWTgoERizVm?usp=sharing) 1. ![](https://i.imgur.com/wtl1X2c.png) --- 2. ![](https://i.imgur.com/LSQGHXF.png) --- 3. ![](https://i.imgur.com/NLbBzvL.png) --- 4. ![](https://i.imgur.com/KTykiD1.png) --- 5. ![](https://i.imgur.com/VNxWXaS.png) --- 6. ![](https://i.imgur.com/miD6Tpe.png) <h5> Quantitative evaluation inpaint Nuscenes</h5> | Mask to image ratio | L1 | PSNR | SSIM | | ----------- | ---- | ---- | ---- | | \[0.01, 0.1) | 1.44 |31.36 |0.96 | | \[0.1, 0.2) | 4.68 |24.05 |0.90 | | \[0.2, 0.3) | 9.89 |19.47 |0.84 | | \[0.3, 0.4) |16.20 |16.90 |0.77 | | \[0.4, 0.5) |25.61 |14.60 |0.67 | <h5> Quantitative evaluation inpaint UPB</h5> | Mask to image ratio | L1 | PSNR | SSIM | | ----------- | ---- | ---- | ---- | | \[0.01, 0.1) | 1.88 |29.17 |0.95 | | \[0.1, 0.2) | 5.70 |22.26 |0.90 | | \[0.2, 0.3) |12.79 |17.16 |0.83 | | \[0.3, 0.4) |20.10 |14.86 |0.77 | | \[0.4, 0.5) |28.31 |13.50 |0.66 | <h5> Quantitative evaluation inpaint Nuscenes (GAN)</h5> | Mask to image ratio | L1 | PSNR | SSIM | | ----------- | ---- | ---- | ---- | | \[0.01, 0.1) | 1.53 |31.00 |0.96 | | \[0.1, 0.2) | 5.03 |23.85 |0.90 | | \[0.2, 0.3) |11.10 |18.88 |0.83 | | \[0.3, 0.4) |18.24 |16.17 |0.75 | | \[0.4, 0.5) |27.64 |13.96 |0.65 | <h5> TODO </h5> 1. Open-loop evaluation for multiple models 2. ~~Finish updated close-loop evaluator~~ 3. Close-loop evaluation for best models 4. ~~Write intervention points plotter for UPB data~~ 5. Plot points of intervention on the UPB map 6. ~~Quantitative evaluation inpainting nuscenes~~ 7. ~~Quantitative evaluation inpainting upb~~ 8. NVIDIA feature extractor + different inputs (RGB img, depth, flow) <h3> Week 6 </h3> <h5> Evaluation </h5> | Model | Augmentation | Dataset Balancing | Autonomy(mu, std) | Interventions (mu, std) | | ------ | ------------ | ----------------- | ----------------- | ----------------------- | | Nvidia | no | no | (0.60, 0.13) | (4.79, 2.59) | | Nvidia | yes | no | (0.69, 0.15) | (3.24, 2.21) | | Nvidia | no | yes | (0.64, 0.13) | (3.97, 2.17) | | Nvidia | yes | yes | (0.69, 0.15) | (3.32, 2.27) | | Resnet | no | no | (0.56, 0.13) | (5.55, 2.99) | | Resnet | yes | no | (0.66, 0.16) | (3.90, 2.66) | | Resnet | no | yes | (0.67, 0.15) | (3.56, 2.45) | | Resnet | yes | yes | (0.68, 0.16) | (3.53, 2.47) | <h5> Trajectories UPB Map</h5> ![](https://i.imgur.com/D6O5DjP.png) ![](https://i.imgur.com/c0KtbMb.png) <h5> Unsupervised optic flow</h5> ![](https://i.imgur.com/jJqxvCY.jpg) Model was trained on a small dataset (8900 images) for 8 epochs (model was still learning) [Some samples](https://drive.google.com/drive/folders/1uEyyCMqmyQe6oU88av2dfAf3OAPmO2BA?usp=sharing) <h3>Week 5</h3> <h5> Old steering model problem</h5> * Multiple possible actions for an input scenario For the following situation the model does not know whether the car is slipping of the track or is turning left. In the actual video, the car is turning left, but the model goes straight. Those scenarios (intersections) make the evaluation of the model difficult ![](https://i.imgur.com/zUzgeY4.png) To solve that, add a top-down view with the actual path. This is equivalent of adding a GPS route. * Construct the UPB campus map ![](https://i.imgur.com/S2nbM9l.png) * Plot the route and recenter the map according to vehicle's position (Green circle - vehicle, Red circle - destination, Blue curve - route, Gray curves - UPB's streets) ![](https://i.imgur.com/HgVxbZs.png) <h5> Evaluation of the model with perspective transformations without depth and without inpainting(81 videos of test) </h5> | Max distance between cars[m] | Max angle between cars[rad] | Intervention penalty[s] | Model | Mean Autonomy[%] | Mean Number of interventions | | ------------------------- | --- | ---------------------- | ----- | -------- | ----------------------- | | 1.5 | 0.2 | 6 | NVIDIA | 90 | 0.73 | The interventions appear on turns and when an obstacle needs to be avoided (e.g. cars, pedestrians). Interventions at turns are caused by the thresholds. The car turns, but the distance or the orientation exceed the thresholds. Might be caused by the reduced field of view from croping. <h5> Evaluation of the model with perspective transformation with depth and with inpaiting </h5> Don't have numbers (the evaluation takes too long on my laptop) but from the few tests, the bending bending artifacts in the previous version do not influence network's decisions. See gifs below: [Steering trained with old augmentation and tested with old augmentations](https://drive.google.com/drive/folders/1Chn425Lt5FKwYx6vE9ZLX--WvcCnV7LP?usp=sharing) [Steering trained with old augmentation and tested with new augmentations](https://drive.google.com/drive/folders/1zNk97Kt8Xu2xA4x19qycsgznVvghFlku?usp=sharing) <h5> Training snapshots</h5> 1. ![](https://i.imgur.com/HCRVJ4e.png) 2. ![](https://i.imgur.com/Un4f4lo.png) <h3> Week 4 </h3> <h5> Monodepth2 fine tuned on UPB Dataset </h5> | Column 1 | Column 2 | | ------------------------------------ | -------- | | ![](https://i.imgur.com/W1xCbBq.png) |![](https://i.imgur.com/fTZGItL.png) | | ![](https://i.imgur.com/ge4HrHm.png) |![](https://i.imgur.com/OCCCJaw.png) | | ![](https://i.imgur.com/J52LaI8.png) |![](https://i.imgur.com/bxH2prG.png) | | ![](https://i.imgur.com/0djwbkC.png) |![](https://i.imgur.com/7iGxcrZ.png) | | ![](https://i.imgur.com/exBOhjv.png) |![](https://i.imgur.com/Ent5xTN.png) | <h5>Inpainting Nuscenes trained from scratch</h5> Input, Output, Ground Truth 1. ![](https://i.imgur.com/6yD93VP.png) --- 2. ![](https://i.imgur.com/Je3yAmm.png) --- 3. ![](https://i.imgur.com/kQpEk5j.png) --- 4. ![](https://i.imgur.com/kzBbRLN.png) --- 5. ![](https://i.imgur.com/Stn6qiJ.png) --- 6. ![](https://i.imgur.com/UreszEb.png) --- 7. ![](https://i.imgur.com/acuX7YH.png) --- 8. ![](https://i.imgur.com/3jjap4g.png) --- 9. ![](https://i.imgur.com/VQiopFw.png) --- 10. ![](https://i.imgur.com/TVbTPdJ.png) --- <h5>Inpainting UPB fine tuned</h5> Input, Output, Ground Truth 1. ![](https://i.imgur.com/rwa9N2u.png) --- 2. ![](https://i.imgur.com/oq9M84c.png) --- 3. ![](https://i.imgur.com/zyO13JL.png) --- 4. ![](https://i.imgur.com/xR5MI7k.png) --- 5. ![](https://i.imgur.com/IXrifY4.png) --- 6. ![](https://i.imgur.com/PblQH1K.png) --- 7. ![](https://i.imgur.com/sh4olcq.png) --- 8. ![](https://i.imgur.com/6JOYXgK.png) --- 9. ![](https://i.imgur.com/QAWriAA.png) --- 10. ![](https://i.imgur.com/MYfEu1G.png) --- <h5> Pipeline </h5> 1. ![](https://i.imgur.com/no0zBJR.png) --- 2. ![](https://i.imgur.com/VUeBXmz.png) --- 3. ![](https://i.imgur.com/0nXN8cs.png) --- 4. ![](https://i.imgur.com/wTV6uAc.png) --- 5. ![](https://i.imgur.com/1Z8ybrO.png) --- 6. ![](https://i.imgur.com/HCrwefT.png) --- 7. ![](https://i.imgur.com/s1KepnZ.png) --- 8. ![](https://i.imgur.com/rrsJWrB.png) --- 9. ![](https://i.imgur.com/fTXczLe.png) --- 10. ![](https://i.imgur.com/0TXv0Eo.png) --- 11. ![](https://i.imgur.com/piZghl7.png) --- 12. ![](https://i.imgur.com/MeeQ8fL.png) --- 13. ![](https://i.imgur.com/y12RTkK.png) --- 14. ![](https://i.imgur.com/nXvgjqh.png) --- 15. ![](https://i.imgur.com/GhQ3ptW.png) --- <h5> More augmentation results </h5> | Column 1 | Column 2 | Column 3 | | ------------------------------------ | ------------------------------------ | ------------------------------------ | | ![](https://i.imgur.com/eRt83D6.png) | ![](https://i.imgur.com/ZLTtyqZ.png) | ![](https://i.imgur.com/ZjM5Sj7.png) | | ![](https://i.imgur.com/15K7X58.png) | ![](https://i.imgur.com/410VhV4.png) | ![](https://i.imgur.com/7xNTlbd.png) | | ![](https://i.imgur.com/bpjNGWt.png) | ![](https://i.imgur.com/WmhIub1.png) | ![](https://i.imgur.com/01Gfpav.png) | | ![](https://i.imgur.com/bpjNGWt.png) | ![](https://i.imgur.com/0j5CjDV.png) | ![](https://i.imgur.com/xdpv9FQ.png) | | ![](https://i.imgur.com/SbIYlaR.png) | ![](https://i.imgur.com/r9bnPSz.png) | ![](https://i.imgur.com/EghlTC1.png) | | ![](https://i.imgur.com/75XB5Z6.png) | ![](https://i.imgur.com/Nt1xibF.png) | ![](https://i.imgur.com/9MPPtWa.png) | | ![](https://i.imgur.com/TAHV6QI.png) | ![](https://i.imgur.com/ImfYvsZ.png) | ![](https://i.imgur.com/qywvG39.png) | | ![](https://i.imgur.com/mDQPEmy.png) | ![](https://i.imgur.com/qlmbaV0.png) | ![](https://i.imgur.com/ICEPeq6.png) | | ![](https://i.imgur.com/53gghZ1.png) | ![](https://i.imgur.com/4tSbZAl.png) | ![](https://i.imgur.com/ARkURrK.png) | | ![](https://i.imgur.com/lOrfT62.png) | ![](https://i.imgur.com/THJEt1p.png) | ![](https://i.imgur.com/aQk27Ul.png) | | ![](https://i.imgur.com/lOrfT62.png) | ![](https://i.imgur.com/XNfZawt.png) | ![](https://i.imgur.com/7UGjBnt.png) | | ![](https://i.imgur.com/T6v80c9.png) | ![](https://i.imgur.com/D8Xx19u.png) | ![](https://i.imgur.com/TLhD4Hy.png) | | ![](https://i.imgur.com/lPLi6xk.png) | ![](https://i.imgur.com/zImwQeD.png) | ![](https://i.imgur.com/ly2OINb.png) | | ![](https://i.imgur.com/erUmQNO.png) | ![](https://i.imgur.com/lf8Y4zV.png) | ![](https://i.imgur.com/D9PDZmw.png) | <h5> Training pipeline small dataset </h5> ![](https://i.imgur.com/M8DDBvW.png) <h5> Evaluation using simulator </h5> [Sample GIF](https://drive.google.com/file/d/1-WPvVMB0xyIFhnKu3W5M21t6qncKdLc1/view?usp=sharing) <h3> Week 3 </h3> <h5>Depth factor computation</h5> Transform from pixel coordinates to camera coordinates $$ [p_x, p_y] \rightarrow [p_x, p_y, 1] \rightarrow [p_x \cdot z_{pred}, p_y \cdot z_{pred}, z_{pred}] \rightarrow [c_x, c_y, c_z] $$ Sample pixels in front of the car in a box. Let $(h, w)$ be the height and the width of the image. The box is delimited by $(h-10:h) \times (w - 50: w + 50)$. Take the median value of the $c_y$ coordinate (we call this $m_y$). We know that the camera is situated $1.5[m]$ above the ground. The factor is computed as follows: $$ f = \frac{1.5}{m_y} $$ Example of translation with $0.88[m]$: ![](https://i.imgur.com/6mk45SZ.png) <h5> Pipeline </h5> <ul> <li>Fine tuned monodepth 2 on UPB dataset for a restricted dataset (still have problems when using entire dataset)</li> <li>Inpaint trained on NuScenes. No fine tune. </li> </ul> 1. ![](https://i.imgur.com/7EVArQE.png) --- 2. ![](https://i.imgur.com/V6N5XLg.png) --- 3. ![](https://i.imgur.com/BiGcQfF.png) --- 4. ![](https://i.imgur.com/veczNXX.png) --- 5. ![](https://i.imgur.com/lsU02fl.png) --- 6. ![](https://i.imgur.com/cSIdwKC.png) <h5> Other preparation </h5> <ul> <li>Prepare NuScenes dataset for new inpaint training. This will be the model to be fine tuned on UBP</li> <li>Preprare UBP dataset for inpaint fine tunning.</li> <li>Trying to fix fine tunning of Monodepth2</li> </ul> <h5> Summary Monodepth2 </h5> Formulate the problem as minimzation of a photometric reprojection error at training time. Express the relative pose for each source view $I_{t^\prime}$, with respect to the target image $I_t$ 's pose, as $T_{t \rightarrow t^\prime}$. Predict a dense map $D_t$ that minimizes the photometric reprojection error $L_p$ where: $$ L_p = \sum_{t} pe(I_t, I_{t^\prime \rightarrow t}) $$ and $$ I_{t^\prime \rightarrow t} = I_{t^\prime} \Bigg \langle proj(D_t, T_{t \rightarrow t^\prime}, K) \bigg \rangle $$ Use $L_1$ and $SSIM$ to make the photometric error function $pe$: $$ pe(I_a, I_b) = \frac{\alpha}{2} (1 - SSIM(I_a, I_b)) + (1 - \alpha) \|I_a - I_b\|_1 $$ where $\alpha=0.85$ Edge-aware smoothness $$ L_s = |\partial_x d_{t}^{*}| e^{-|\partial_x I_t|} + |\partial_y d_{t}^{*}| e^{-|\partial_y I_t|} $$ where $d_t^{*} = d_t / \overline{d_t}$ is the mean_normalized inverse depth to discourage shrkingin of the estimated depth Imporvement: $$ L_p = \min_{t^\prime} pe(I_t, I_{t^\prime \rightarrow t}) $$ ![](https://i.imgur.com/qtAXpvr.png) Auto-masking method that filters out pixels which do not change appearance from one frame to the next in the sequence. $$ \mu = [ \min_{t^\prime} pe(I_t, I_{t^\prime \rightarrow t}) < \min_{t^\prime} pe(I_t, I_{t^\prime})] $$ where $[]$ is the Iverson bracket. Multiscale estimation: Instead of computing the photometric error on the ambiguous low-resolution images, we first upsample the lower resolution depth maps (from the intermediate layers) to the input image resolution, and then reproject, resample, and compute the error pe at this higher input resolution. Denote this error by $L_s$ <b>Final Training Loss: </b> $$ L = \mu L_p + \lambda L_s $$ <h3> Week 2 </h3> <h5> Monocular depth prediction finetuned of UPB Dataset (640x320 orig -> 512 x 256) </h5> | Column 1 | Column 2 | | ------------------------------------ | ------------------------------------ | |![](https://i.imgur.com/8VBWkM8.png) | ![](https://i.imgur.com/BYzzTHW.png) | |![](https://i.imgur.com/fm6vQSz.png) | ![](https://i.imgur.com/xbwVNXD.png) | <h5> Perspective augmentation without inpainting (256 x 128) </h5> | Column 1 | Column 2 | | -------- | -------- | |![](https://i.imgur.com/JsiC9HN.png) |![](https://i.imgur.com/XYxSHXZ.png) | |![](https://i.imgur.com/OzK1okM.png) |![](https://i.imgur.com/oPvayCh.png) | <h5> Inpating linspace </h5> | Column1 | Column 2 | Column 3 | Column 4 | Column 5 | | ------- | -------- | -------- | -------- | -------- | | ![](https://i.imgur.com/c8deGAt.jpg) | ![](https://i.imgur.com/mb35QY4.jpg) | ![](https://i.imgur.com/38jqsSW.jpg) | ![](https://i.imgur.com/xZrtF3A.jpg) | ![](https://i.imgur.com/gYBe9nd.jpg) | | Column 1 | Column 2 | Column3 | Column4 | Column 3 | | -------- | -------- | --- | --- | -------- | | ![](https://i.imgur.com/diQcz7o.jpg)| ![](https://i.imgur.com/4zCmiD5.jpg)| ![](https://i.imgur.com/iUPokxY.jpg) | ![](https://i.imgur.com/qi9ckYR.jpg) | ![](https://i.imgur.com/BQso12f.jpg)| <h3> Week 1 </h3> <ul> <li>Iterative translation/rotation visualization</li> <li>Solve monodepth2 training</li> <li>Use monodepth2 & generative model to see performance on Kitti/Nuscenes</li> <li>Find method to evaluate the correctness of the generative model </li> </ul>