Troubleshoot.sh Community meeting

Documentation: https://troubleshoot.sh
Project repo: https://github.com/replicatedhq/troubleshoot
Roadmap: https://github.com/orgs/replicatedhq/projects/4/views/1

Subscribe to the Replicated Community Calendar for the latest updates on community meetings for Replicated Open Source projects

Join #app-troubleshoot on Kubernetes slack:

Intent: This meeting is an opportunity for folks involved in the Troubleshoot project to map out the future, roadmap, and discuss ideas and features that are important to them.

Regular Agenda:

  • Current state of the project
  • Gaps and needs
  • Prioritization
  • meeting chairs
  • open floor

Add items to the agenda in the following format:

  • $git_username: general outline of item for discussion
    • enter any details in a sub-bullet
    • add as much detail as you can
  • $git_username: item #2

Troubleshoot.sh Community Meeting - Mar 6, 2024 1930 UTC

  • Current state of the project
  • Gaps and needs
  • Prioritization
  • meeting chairs
  • open floor

Troubleshoot.sh Community Meeting - Dec 6, 2023 1930 UTC

Troubleshoot.sh Community Meeting - Nov 1, 2023 1930 UTC

Troubleshoot.sh Community Meeting - Oct 4, 2023 2030UTC

  • Current state of the project
  • Gaps and needs
  • Prioritization
  • meeting chairs
  • open floor

Troubleshoot.sh Community Meeting - August 2, 2023 2030UTC

https://replicated.zoom.us/j/88452766694?pwd=VFRXYmhxNHpzRGY1RkFTdythR0xpZz09

Troubleshoot.sh Community Meeting - July 20, 2023 1030UTC

https://replicated.zoom.us/j/88452766694?pwd=VFRXYmhxNHpzRGY1RkFTdythR0xpZz09

  • Current state of the project
  • Gaps and needs
  • Prioritization
  • meeting chairs
  • open floor

Troubleshoot.sh Community Meeting - July 5, 2023 2030UTC

https://replicated.zoom.us/j/88452766694?pwd=VFRXYmhxNHpzRGY1RkFTdythR0xpZz09

  • Current state of the project
    • Stable API and Consolildated CLI in review
  • Gaps and needs
  • Prioritization
  • meeting chairs
  • open floor
    • @danj-replicated: Uploading support bundles and preflights implementation requires review. Current implementation is fragile and also differs between preflight and support-bundle workflows.

Troubleshoot.sh Community Meeting - May 31st, 2023 2030 UTC

  • Current state of the project
    • Enhancements to work better with directly with helm
      • Exit code added so we can use it as CLI tool in a script and detect the outcome
      • Added stdin support to preflight that can handle a stream of output and just find the relevant manifests for itsself
  • Gaps and needs
    • Combining support bundle and preflight into a single concept
      • Make the schema for both the same
    • Preflights for checking required versions
      • Maybe troubleshoot can already do this and just need an example of necessary templating
    • Run all collectors in cluster
    • Analyzers are too hard to write
      • Writing new ones too hard
      • Utilizing existing ones too hard
      • More examples.. finding the files to analyzing
      • Good example in slackernews, how would anyone not intimately fimiliar with the internals know how to analyze the slack api return codes?
    • Would be nice if sbctl handled older bundles better
    • troubleshoot.io/ labels switch to troubleshoot.sh
  • Prioritization
  • meeting chairs
    • @drohnow
  • open floor
    • @drohnow
      • Search support bundle for errors automatically
      • @z4ce: Maybe use generative AI.. ChatGPT plugin

Troubleshoot.sh Community Meeting - May 18th, 2023 10:30 UTC

  • Current state of the project
  • Gaps and needs
    • @mhrabovcin: Improvements cluster resources collector
      • Try to find a more generic way of collecting all resources with an option of having a "filtering" mechanism where we might want to ignore certain resources
      • @banjoh: "kubectl cluster-info dump" does this already. We'd want to explore this
  • Prioritization
    • N/A
  • meeting chairs
    • @banjoh
  • open floor
    • @mhrabovcin: Troubleshoot live project
      • https://github.com/mhrabovcin/troubleshoot-live
      • @danj-replicated: We'd need to test this out to see how it works. sbctl has some limitations that troubleshoot-live solves such as surfacing CRDs
      • Launches KAS and ETCD and creates all the resources via k8s API

Troubleshoot.sh Community Meeting - May 3rd, 2023 2030 UTC

  • Current state of the project
  • Gaps and needs
    • anyone have issues that aren't getting the priority they need?
      • Dan: support-bundle hanging - need more info, can't get debugging from the environment we are seeing this. Needs more investigation. It's possible this is when containers to run the collector go to ImagePullBackoff
    • anyone have new issues/requests that aren't yet documented?
    • @York Chen: (D2iq) have forked Troubleshoot, want to keep up to date
  • Prioritization
  • meeting chairs
    • @z4ce
  • open floor
    • Docs site search isn't intuitive
      • no CLI docs, there are docs in teh Troubleshoot repo but not in troubleshoot.sh
      • maybe add a PR to Github actions that updates the docs when the markdown in repo changes.

Troubleshoot.sh Community Meeting - April 5th, 2023 2030 UTC

Regular Agenda:

  • Current state of the project
  • Gaps and needs
    • Improvements to analyzers, they aren't getting a ton of use
      • How can we make finding fields in returned objects better and more approchable?
        • Even if objects are collected by different collectors
      • Writting analyzers in go for full logic control is cumbersome
        • Can we do things like allow us to use shell commands?
        • Maybe just run external tooling/scripts?
        • This does pull in a lot of host dependencies and failure points
        • Maybe instead we make it much easier to creating a boiler plate
        • If we need 'jq' style selections is that what people are actually trying to get at anyway?
    • Metrics collection
      • Prometeheus, metrics-server, Loki, etc collection?
      • Can we capture data on things like cpu/memory over time?
      • Alternately collect Alert Manager or Prometheus alert information?
      • Ex: CPU starvation and OOM kills
      • Loki or other out-of-cluster items would likely need user arguments
    • Collectors
    • SDK/CLI consolidation
      • Projects importaing troubleshoot don't have a clearly defined stable api today
      • The CLI's aren't consistent in usage
      • Preflight and support-bundle may not have and difference between them anymore
      • Consider building a troupleshoot analyse|collect|server|etc which would also become the reference for the SDK for how to import and use Troubleshoot
    • Helm pre-flights / support bundles?
  • Prioritization
    • Getting sbctl and the CLI/SDK implemented feel like things we should consider as high priority.
      • These can both enable other projects and increase the user base which will in the end expedite good analyzer creation when we do polish it. More feedback from users is good.
  • meeting chairs
    • @xavpaice
  • open floor

Troubleshoot.sh Community Meeting - March 16th, 2023 0930 UTC

Regular Agenda:

Troubleshoot.sh Community Meeting - February 16th, 2023 0900 UTC

Regular Agenda:

  • Current state of the project
  • Gaps and needs
  • Prioritization
  • meeting chairs
    • volunteer to lead next months' community meeting
  • open floor
    • API stability matters
      • @danj-replicated: When using troubleshoot as a library, parsing specs is not a very straight forward experience
      • @danj-replicated: Reading files from support bundles not very intuitive. This would affect anyone who wants to discover what goes where. A suggestion was to have collectors be the source of truth of where they store their files.

Troubleshoot.sh Community Meeting - February 1st, 2023 0730 UTC

  • Current state of the project
  • Gaps and needs
  • Prioritization
  • meeting chairs
    • volunteer to lead next months' community meeting (US hours)
      • Next one most of the folks in Replicated are away on an off site
  • open floor
    • @xavpaice: Google Summer of Code - do we want to register as a project?
    • Where to run collectors: in cluster (i.e. in a pod) can give different results than in host. If running from the CLI, we can use the runPod to collect info from inside the cluster but other collectors might want to get that same functionality, however, if we're running Troubleshoot from inside a cluster/pod we might not want to spin up a new pod for that.
    • Discussion about Helm, using for preflights and potentially vendoring in Troubleshoot. Current discussion is around calling other programs as a step during the install. Troubleshoot is controlled by the chart author rather than Helm.
    • ability to use Priv containers would allow some of the host collector functionality to run from inside the cluster
    • Conclusions:
      • other projects are interested in importing Troubleshoot, having a stable API would help that immensely

Troubleshoot.sh Community Meeting - January 19th, 2023 0900 UTC

Troubleshoot.sh Community Meeting - January 4th, 2023 2030 UTC

  • Current state of the project
  • Gaps and needs
    • @adamancini: version info, "upgrade" subcommand
      • can't get version from sbctl today, but nice to know if you can upgrade
      • it's not on brew/pkg managers, so "get the latest release tarball" invovles some clicking into github. we don't have a "latest" shortcut
      • Additionally collect this into the bundle so you can tell what version it was collected from
      • support-bundle --debug ./bundle.tar.gz should maybe tell me what version of support-bundle was used to generate the bundle, since some of the bundle features are in the tarball itself, like logs support
    • @adamancini: one CLI to rule them all?
    • Compound conditions
      • Should we try to mirror how K8s does this?
    • json compare: only has equality, would be valuable to add others
      • We should consider how to use other libraries to enable a range of comparisons in all analyzers
  • Prioritization
    • Increasing adoption of new features in other projects (e.g. KOTS)
    • Enabling analyzers to operate on kubectl API and not file paths
      • multiple collector analyzers
      • decouple analyzers from collectors
      • Dependent on getting sbctl into the project
  • meeting chairs
    • volunteer to lead next months' community meeting
      • @xavpaice
  • open floor

Troubleshoot.sh Community Meeting - November 30th, 2022 2030 UTC

  • Current state of the project
    • @xavpaice: Good momentum
    • @xavpaice: Roadmap for the next three months to match goals of project
  • Gaps and needs
    • @xavpaice: Need to support multiple preflight specifications
    • @z4ce: have support-bundle accept type: preflight specs
      • alternately, consolidate preflight and support-bundle, so that they're the same thing but run at different times
      • having a support bundle generated by preflights would be really useful
  • Prioritization
    • log collectors - limit collection by size, as well as lines/age. TODO: check the size of the task
    • sbctl integration to the Troubleshoot repo (spec doc in progress)
  • meeting chairs
    • volunteer to lead next months' community meeting
      • Chris
  • open floor
    • Helm plugin for running preflights etc., it's been evaluated prior but determined to not be ideal (not sure of reasons)

Troubleshoot.sh Community Meeting - November 2nd, 2022 2030 UTC

Join the meeting on Zoom using this link

  • Current state of the project
    • hunting tar for the pod logs is difficult
      • changing location that the logs collector stores logs is in progress, but difficulties with symlinks
    • identifying file names especially when trying to write an analzyer or pre-flight
      • This could help: https://github.com/replicatedhq/troubleshoot/pull/780
      • would be nice to run sbctl in an analyzer
        • There has been a discussion about moving sbctl into the project, and with that possibly using it as a way to write analyzers against the K8s API
        • This might also benefit from adding proper entrypoint and subcommands to the project at the same time, enable subcomands more like kubectl w/ verbs and nouns
  • Gaps and needs
  • Prioritization
    • log file access via sbctl (This has been in-flight with simlinks but has gotten harder than expected)
    • API based access to Kubernetes data rather than file scraping (ex: sbctl for analzyers and users in-project)
  • meeting chairs
    • volunteer to lead next months' community meeting
      • Ian
  • open floor
    • Ian: What can/should we be considering to increase visiblity for the community and get more members involved?
      • Ada: More tutorials and blogs for periodic releases and presentations would help introduce people
      • Ian: We should review and reach out to projects using Troubleshoot (D2IQ, EKS Anywhere, etc)
  • @xavpaice: IP address redaction https://github.com/replicatedhq/troubleshoot/issues/735
    • Options: replace with tokens, redact by default and have an option to disable, redact only when instructed to and not by default
      • Either way this was a feature in a previous Troubleshoot and would be useful regardless of default decision
    • consensus: discuss with KOTS about a deprecation period, best is to default to not redact IPs
  • @xavpaice: any thoughts on reducing duplication of code (collection in particular), making a stable API, and general simplification?
    • @chris-sanders: How/does this inform the idea of moving sbctl into the project propper.

Troubleshoot.sh Community Meeting - October 20th, 2022 0900 UTC

Join the meeting on Zoom using this link

Attendees: Martin Hrabovicin, Evans Mungai, Dan Jones, Edgar Lanting, Xav Paice

  • Current state of the project

  • Gaps and needs

    • mailing list for the project? (TODO: Xav)
    • IRC channel (TODO: Xav)
    • https://github.com/mesosphere/troubleshoot fork
    • idea to add a generic runtime arg option to specs, which could work in a similar way to CLI options
    • add a dry-run option
    • have some options to change default behavior in the spec (e.g. which default redactors run)
    • plugable collectors, a means to run a custom collector that's not upstream (see Velero for example)
  • Prioritization

    • IP addres redaction change is a quick win
    • concurrency of collectors
    • design and understanding for pluggable collectors/analyzers/redactors
  • meeting chairs

    • volunteer to lead next months' community meeting
  • open floor

  • @xavpaice: Speed of collection

  • @xavpaice: Efficiency of redaction

    • Redaction is adding about 16 seconds when you are limited to 1 cpu.
  • @xavpaice: Stable API for projects to import

    • Projects (kots, kURL, EKS Anywhere) are importing parts of Troubleshoot and running them. We should consider a stable API so we do not make breaking changes that affect those projects.
  • @xavpaice : Sbctl - do we replace this with a ‘real’ k8s API?

Troubleshoot.sh Community Meeting - July 7, 2022 1100 PT/ 1400 ET

  • No topics meeting ended

Troubleshoot.sh Community Meeting - June 3, 2022 1100 PT/ 1400 ET

Join the meeting on Zoom using this link

  • item 1
  • item 2
  • next chair
  • open floor

Troubleshoot.sh Community Meeting - May 6, 2022 1300 PM PT/1600 PM ET

Join the meeting on Zoom using this link

  • v0.32.0 release
  • @OGtrilliams KubeCon EU updates
    • KubeCon webinar will be livestreamed on Replicated YouTube
  • meeting chairs
    • volunteer to lead next months' community meeting
    • @divolgin to chair next month's meeting (TENTATIVE)
  • open floor

Troubleshoot.sh Community Meeting - April 7, 2022 1100 AM PDT/ 1400 PM EDT

Join the meeting on Zoom using this link

Troubleshoot.sh Community Meeting - March 10, 2022

Join the meeting on Zoom using this link

  • sbctl overview with @divolgin
  • open floor

Troubleshoot.sh Community Meeting - March 3, 2022

Interview with Chris Sanders

Troubleshoot.sh Community Meeting - February 3, 2022 11:00 AM PST / 2:00 PM EST

Zoom link

Troubleshoot.sh Community Meeting - December 2, 2021 11AM PST/2pm EST

Join the meeting using the following Zoom link: https://replicated.zoom.us/j/84125433779?pwd=ZHAwUFFid2thdzM2Rzdxek05cG1udz09 (ID: 84125433779, passcode: 6An1Rpp9)

Troubleshoot.sh Community Meeting - November 4, 2021 11AM PDT/2PM EDT

Join the meeting using the following Zoom link: https://replicated.zoom.us/j/84125433779?pwd=ZHAwUFFid2thdzM2Rzdxek05cG1udz09

  • @programmerq - ability to specify namespace and/or selectors at runtime (templated support bundle definition?) to accomodate a bundle for a specific given instance of an application. For cases where there may be multiple instances of the application that vary by namespace, deployment names, labels, etc https://github.com/replicatedhq/troubleshoot/issues/481
  • @programmerq - ability to determine storageclass capabilities in analyzers. conditions based on provisioner, allowVolumeExpansion, or anything else that may come up. https://github.com/replicatedhq/troubleshoot/issues/482
  • @ogtrilliams - open floor

Troubleshoot.sh Community Meeting - 11 AM PDT/2 PM EDT October 7, 2021

Join the meeting using the following Zoom link: https://replicated.zoom.us/j/81568123981?pwd=a0lFSXpoVXA4bkJVamVyUTdNdFZodz09

Troubleshoot.sh Community meeting - 11am PDT/2PM EDT September 2, 2021

Join the meeting using the following Zoom link: https://replicated.zoom.us/j/89062276386?pwd=dHhMMmpBRWUyYzhOZDh5cEFLRFRsQT09

  • @murphybytes: - Discuss Remote Host Collector feature by @croomes
  • open floor?
  • placeholder (delete me)

Action items

John Murphy (@murphybytes) has volunteered to work with @crooms to refine PR #392

Troubleshoot.sh Community Meeting - 11:00 AM PST August 5, 2021

Join the meeting with the following Zoom link: https://replicated.zoom.us/j/89062276386?pwd=dHhMMmpBRWUyYzhOZDh5cEFLRFRsQT09

  • @dexhorthy - Aggregating awesome SupportBundle and Preflight specs from the wild

  • @divolgin - better process for reviewing and merging community contributions

    • e.g. https://github.com/replicatedhq/troubleshoot/pull/392 by @croomes
      • currently being worked on by John Murphy
    • ogtrilliams will work w/ John Murphy to update community guidelines & investigate CI/CD platforms
    • dedicated reviewers list?
    • develop pre-vetting process
    • ogtrilliams create process where potential contributors write out issue template with outline on proposed contribution that'll be sent to reviewer board. once approved, PR can be submitted.
  • @divolgin - Things that make support bundle hard to use

    • Analyzers are hard to troubleshoot when they don't work.
    • File names produced by collectors are hard to figure out when result is used with analyzers.
    • Collectors may never complete and there is no global timeout.
  • @emosbaugh - this is such a large change to host preflights. is this a direction we want to take them? should this be a "regular" preflight?

  • @marccampbell - Replace the CLA with a DCO?

    • will be implemented in ~1 week's time
  • Open floor

action items

  • Contributing guide will be started by John Murphy
  • look into creating public version of design plans for troubleshoot.sh
Select a repo