elastic-recheck purpose

tags: Design, Elastic

Update 2020-11-20

  • Upstream logstash does not support regex queries, but it may be possible to enable them if someone upgrades the deployment (not neglijable)
  • We may be able to get quick results by refactoring the sova patterns out of sova. In fact the entire ansible module is better outside artcl project which is too dependent on openstack

data sources

  • local files (when run as sova)
  • logstash upstream
  • logstats rdo
  • ? internal
  • mysql on upstream which contains tempest results

query sources

  • sova-patterns.yml (~180pcs)
    • contains a mix of regex and logstash queries
  • e-r queries/ folder (~100pcs)
    • each query is linked to a LP bug

findings

  • each changeset can create multiple builsets, on multiple servers
  • each job has one log server (http) and potentially one log-index server.
  • some job may not have a builset (jenkins?), but all of them should be able to identify a changeset
  • gerrit event stream is a good source of events but it comes is very long delays for build result because the message appears only when the entire builset ends
    • this means a linter failure, may take 5h to be identified

challanges

  1. scalability
    • Performing ~300 queries on each build put a lot of strain onto log servers
  2. result relevance
    • if you get 20 matches, which one is more important?
    • how do we distinguish between root causes and side-effect ones?
      • some basic silencing heuristic will be needed to avoid noise
      • no AI, please!
  3. reporting
    • inside job report is very useful but cannot be updated post collection
    • gerrit comment can happen with delay, but can also become spammy
    • irc notifications can easily become spam
    • dashboard (any html)
      • how do we decide what goes into dashboard or not?
      • which queries deserve graphs or not?
        • track last known match for each query
        • track each job-match for a number of days
Select a repo