Design
, Elastic
Elastic Recheck Documentation: https://docs.openstack.org/infra/elastic-recheck/
Elastic Recheck Dashboard: http://status.openstack.org/elastic-recheck/
Elastic Recheck is available upstream but it has been very difficult to get querries merged into the upstream database. Sorin Sbarnea is focused on revitalizing the project and getting core rights.
Goal: Make elastic recheck provide meaningful feedback for tripleo across all of our use case. To provide recheck services across all of the following use cases the first step will be to containerize the service to make it more portable and maintanable.
ER is made out of several small services:
dashboard: apache httpd serving a static pages and also a a json file with the graphs.
watch-bot: notifies irc-channels about various matches found (not a priority for us, yet)
crons
elastic-recheck-all
@30min - calls elastic-recheck-graph and produces json fileelastic-recheck-gate
@30min - sameelastic-recheck-uncat
@30min - calls elastic-recheck-uncategorized command, produces some htmlwatchbot: Pools gerrit using stream event for changes in order to detect zuul comments about finished builds.
Elastic recheck is organized as a python package that installs several CLI tools:
At this moment the deployment part is still done using pupper-elastic-recheck repository but we are working to replace it with ansible/container/compose. My goal is to make it as easy to start a private instance locally with just "make run".
graph LR
gr -- load-config --> ./queries/*.yaml
gr -- http --> logstash.opensack.org
gr -- write --> output.json
gr(elastic-recheck-graph)
RDO Softwarefactory stores job logs in kibana
ATM queries have a 1-1 connection with bugs in launchpad. Expand that to jira and bugzilla
Elastic recheck is a valuable service that can inform developers of known issues that have caused their OpenStack builds and jobs to fail automatically via a comment on their gerrit review or via a dashboard.
This is useful at any level and in any of the above use cases.
If all five use cases can be covered we have a single location to express known issues / bugs and have developers informed automatically regarding why deployments / jobs are failing.
picking up elastic-recheck as a project due to lack of community atm… see we value in this tool not dying.
emphasize the delination between the upstream running instance and query repo.. hands off..
Goals:
https://docs.google.com/document/d/1s5v43HNwRy8X9CFeLhAXaOpakiK_XpJTaD_WB18KwTs/edit
To help ensure our results and hits are meaningful
build_name
build_status ( SUCCESS / FAILURE )
message:"ReadTimeoutError: HTTPConnectionPool" AND (tags:"console") AND voting:1 AND build_status:SUCCESS AND build_name:tripleo-*
message:"ReadTimeoutError: HTTPConnectionPool" AND (tags:"console") AND voting:1 AND build_status:SUCCESS AND NOT build_name:neutron*
"must_not": [
{
"fquery": {
"query": {
"query_string": {
"query": "build_name:neutron*"
}
},
"_cache": true
It's important to know how often certain tempest
tests are failing. There are a couple of ways to
do this, and I need to sync w/ Arx on this topic.
Add tempest run log to logstash
https://review.opendev.org/c/openstack/ansible-role-collect-logs/+/794664
the openstack-health way..
We could also build queries w/