# OVB Status: Increasing robustness of our OVB jobs
###### tags: `ruck_rover`
## Current situation with OVB jobs
Dozens of our OVB jobs fail on intermittent failures each day! Rerunning and debugging jobs dozens of times a day is exhausting our rucks and rovers.
To get an idea how bad the situation really is, refer to these rr notes:
* [2022-09-13 - 2022-09-15](https://hackmd.io/dKeK6zo9R66heikGyCb4NA)
* [2022-09-09 - 2022-09-12](https://hackmd.io/s4TgnCY-QQGKv2ONxTjOZA)
* [2022-09-01 - 2022-09-01](https://hackmd.io/94uNoMlnQgegrgy1iXV1kQ)
* [2022-08-26 - 2022-08-31](https://hackmd.io/7qAKWCiCQA6IEdE9WXYn4Q)
For example, it took sometimes up to 20 reruns of the same job to get a promotion of c9 master or c9 wallaby. Typical culprit of that period were c9 master/wallaby fs35 and fs64.
## Ideas for improvements
1. Do OVB provide value?
2. How can we utilize IBM cloud in a better way?
3. Why ovb jobs are failing?
4. Can we reduce ovb testing in check?
5. results/pointers to previous efforts (we investigated memory, and tempest split?)
6. Can we make our promotion smarter - pass if corresponding internal fs passes
7. Are we triggering ovb-check job on correct file changes?
8. Next gen resources make the situation worst?
9. Using elastic-recheck to report back on patches for rdo-check job failures?
10. Partner from DF in debug?
11. Doug's work on all nodes on one compute
12. Are we running OVB as the right test - hardware prov?
13. What are the actual errors?
14. [jm1] Most intermittent failures let us conclude that RDO/RHOS/TripleO is very sensitive to load or latency (cpu? network? disk?) of the underlying systems. We need help from DFGs to make RDO/RHOS/TripleO more robust. Customers will likely run in some of these issues as well, but they do not know that a simple rerun could fix it, since no customer-facing document states this.