sustainability-landscape
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    2
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Kepler Release Community Meeting Minutes === ###### tags: `kepler` `release` - **Meeting recordings:** https://youtube.com/playlist?list=PLz3pRD3kGUsd25nwuE-cDWgISktOpJmCx - **Date:** - May 9, 2023 - **Agenda** - 0.5 release update - 0.6 planning https://github.com/orgs/sustainable-computing-io/projects/2/views/1 - - **Date:** - Apr 4, 2023 - **Agenda** - Development updates: - dependent library update - cgroup v1/v2 are both supported - CPU map update - https://github.com/sustainable-computing-io/kepler/pull/601 - Any need to update models? - validate the models - DRAM intensive workload to train dram model - VM case: how to get memory heuristics to use dram model - Operator available at OperatorHub, new release will be cut around 04/15 - 0.5 Release date - Around Apr 15th - Any critical PRs? - https://github.com/sustainable-computing-io/kepler/pull/609 - https://github.com/sustainable-computing-io/kepler/pull/611 - The motivation makes sense: reduce pod listing overhead and get pod info in real time. - priviledge escalation by accessing docker socket is a concern, will discuss it with OpenShift architect. - Critical issues - https://github.com/sustainable-computing-io/kepler/issues/610 - https://github.com/sustainable-computing-io/kepler/issues/608 - add cgroup id - https://github.com/sustainable-computing-io/kepler/issues/594 - https://github.com/sustainable-computing-io/kepler/pull/599 (?) - CNCF TAG demos - Sustainability (Apr 5th 11AM ET) - Runtime (Apr 6th 11AM ET) - **Date:** - March 21, 2023 - **Agenda** - Development updates: `30min` - Operator - Standalone Kepler - Multi arch support - 0.6 release feature planning - Kepler project board update - Training on different CPU architecture - External power source support: BMC, Sentry - Scalability: Prometheus client (https://github.com/sustainable-computing-io/kepler/discussions/439) - Models documentation and usability improvement - Planning meeting date - End of Apr (after KubeCon EU) and end of May (end of OSS NA) - - **Date:** - March 07, 2023 - **Agenda** - New kepler model server pipeline `20-25min` - Follow up on license on models and training data - Development updates: `15min` - Operator - Standalone Kepler - TBD - Discussions `10-15min` - ACPI/IPMI/HMC power reading - CGroup/Kubelet library import vs reading from cgroup sysfs and kubelet metrics endpoint - **Date:** - Feb 21, 2023 - **Agenda** - Benchmark test intro `15min` - How to add new kepler model server pipeline `20-25min` - Discussions `5-10min` - **Date:** - Feb 07, 2023 - **Agenda** - Current progress - Scalability issues - Testing - Deployment :::info - **Date:** - Jan. 24, 2023 - **Agenda** - 0.5 release planning - Release capitan: Parul - Project planning - CNCF TAG presentation video - **Date:** - Jan. 10, 2023 - **Agenda** - Betrand introduce Sentry - HW agent/monitoring (BMC, etc) - support OpenTelemetry, collecting HW metrics from a range of vendors (cisco/HPE/etc) through SNMP/Redfish/etc - Power estimate based on HW specs - HW metrics: energy in joules (hw_host_power_watts, hw_host_energy_joules_counter) - Looking into VM using ratio based approach - Sentry SW is free but not open sourced (legacy issue) - Demo: https://hws-demo.sentrysoftware.com/d/SHFBpSH7z/hardware-sentry-site?orgId=1&var-site=Sentry-Ottawa&from=now-2d&to=now - Architecture diagram: https://www.sentrysoftware.com/products/hardware-sentry.html - https://www.sentrysoftware.com/docs/hws-doc/3.0.00/index.html 1. v0.4 release review `15min` - Feature - Deployment - Test - Doc 2. v0.5 planning `30min` - https://github.com/orgs/sustainable-computing-io/projects/2 - Feature: please add your thinking to the project board. - Sam: MVC like architecture, OTEL support - Deployment - helm chart: Kepler is included, prometheus/grafana is under consideration.GHA to generate more data during test. - operator: works for k8s, testing for ocp. v0.5 will support model server. bundles ready for operatorhub registration. cluster-admin for operator? - Test: consolidated integration test using Github action, setting kind, install bcc, can be used for operator. Test coverage improvement (especially those silently bypassing panic), benchmark testing. - Doc - Model: offline to online upload. Node power modeling and verify online training. Formalize Accuracy testing of Models. 3. Discussion `15min` - Conference talks - CNCF TAG presentation: prepare offline recording (5min each) and reserve a 15-20min TAG meeting time for presentation. 4. Issues - Root privilege in deploying kepler (check eBPF) - GPU support: HW spec support matrix (check nvidia library supportability matrix), shared or dedicated GPU usage. :::info - **Date:** - Dec. 13, 2022 - **Agenda** 1. Update and progress `40min` - Kepler: no urgent task, - GPU issue under debugging (thank David Gray) - CPU frequency: reading sysfs is expensive. CPU frequency is only available on BM. Reading from kernel tracepoint doesn't always work (because it is activated only when governor changes). Reading HW counters (cycles, hperf) can provide real time frequency. https://github.com/sustainable-computing-io/kepler/pull/427/files. CPU time calculation is also identified and fixed. The PR removes many expensive calls. The performance is also improved. - Kepler Test: CI, GH Action - Kepler Doc: - Operator: Progress and release estimate - Helm: Progress and next step (chart will be merged) 2. Release decision `10min` - Kepler: Dec 16 - Kepler Operator: week of Dec 19 - Helm chart: TBD 3. Welcome new committers `5min` - https://github.com/sustainable-computing-io/kepler/pull/459 4. 2023 community meeting schedule `5min` - Starting mid of Jan, bi-weekly, Huamin to update zoom invite on README.md - **Date:** - Nov. 29, 2022 - **Agenda** 1. Update and progress `30min` - VM support and model estimator integration - Offline models are available, power estimate on VM is working. - e2e test cases are available to verify kepler container and node metrics (Sam: can you add test case to detect new Pods and their metrics? Huamin to add issues. Prometheus client is used in e2e, metrics reading can be added) - Estimator sidecar will be verified and added to e2e test (not in this release) - What is the status of online model training and updating? (manual test first, Pang will share her sidecar and model server manifests) - test coverage - improved from lower 30s to 39%, need more unit test (Huamin to investigate which pkg needs more tests and create issues. Sam: create tests for each pkg. Internal pointer/connector makes test case hard to make, pointer value validation etc needs refactor. Some pkg requires bcc library, making mac user hard to add test cases, maybe conditional build tag? Huamin will create issues for mac/refactor) - Deployment - Operator - [v1alphav1](https://github.com/sustainable-computing-io/kepler-operator/tree/v1alpha1) (Parul will send a demo based on kind. TODO integrating model server. OperatorHub integration will happen later. Sally will help on deploying on OpenShift/MicroShift) - tested kepler-exporter on kind, cluster-prerequisite for openshift WIP - working on integration with estimator and offline models - TO-DO Parul: Document features present in current operator and what will be expected in the next release. - Helm (https://github.com/sustainable-computing-io/kepler-helm-chart) - PR ready, Sam commented/reviewed. Tested on kind (validated exporter and output). Still working on Prometheus and Grafana (may add in the future) - to investigate how to release the chart (maybe use github actions) - Docs - [Simplify end user doc](https://github.com/sustainable-computing-io/kepler/issues/418) (Nikki to make two contributions) 2. Issues `20min` - [0 process](https://github.com/sustainable-computing-io/kepler/issues/422) (Add logging) - [podEnergyStatLabels needs update](https://github.com/sustainable-computing-io/kepler/issues/408) (already in local repo, doesn't affect model training atm) - [block_device_used logging](https://github.com/sustainable-computing-io/kepler/issues/355) (let's lower the verbosity first) 3. Questions and Discussions `10min` - Release criteria and date to be finalized - https://github.com/sustainable-computing-io/kepler/issues/333 (Estimator/kepler: configmaps. Pang will share the examples (in the discussion and docs PR)) - Should cgroup v1 be supported in cgroup metrics based models? (Let's document this and investigate more next release) - Shared e2e on all repos: - Kepler(including estimator and model server) - Operator, helm :::info - **Date:** - Nov. 14, 2022 - **Agenda** 1. Issues and progress `40min` - Issues - VM support, Estimator, Model Server usage - VM: CPU host passthrough tested (with perf counters metrics), cgroup metric model comes next - development in model branch: cgroup pkg issues found. cgroupo metrics not reached. The work function is not finalized, under debugging. Pang is working on it and will report an issue. - Estimator sidecar: tested before the metric refactoring. Config names in env var (also in the dev branch): set estimator to true. Huamin to add debugging to the sidecar. - Kaiyi: update namespace in model server to kepler (in deployment and in Service endpoint) - Process to ensure usability and performance - Marcelo: metrics doc (including samples) updated, new grafana dashboard PR (not yet using all the metrics), all power sources are in their own metrics - Separate metrics vs aggregating at the Prometheus: performance hit on prometheus should be avoided; Aggregating on the Kepler side can help the scalability. - Having dedidcated power source metrics can be used by label based aggregation. End user can query/check individual or aggregate metrics based on the basic metrics. - Docs - mkdocs vs Sphinx vs Hugo: kepler-docs needs dev preview on local env. Hugo is used by k8s but not as easy as mkdocs. Sphinx provides apidocs, but so do mkdocs. Sphinx is complicated, not consistent preview on github pages and local vs code plugin. - mkdocs also reports broken links, maybe a test needed to ensure all links valid (refer to https://github.com/redhat-et/microshift-documentation/blob/main/.github/workflows/broken-link-check.yml) - kepler-docs approver: Marcelo, Parul, Pang - Update - Operator - v1aphal1 branch: specs defined, main reconciler, abstraction in placeholder. Kepler-exporter: Parul, others: Kaiyi. Preview on kepler-exporter on bare metal in the next two weeks. - Helm - helm chart PR drop today. Will need review. Prometheus/grafana integration. - reviewer: Sally - Next - Test coverage, e2e testing - pkgs that need test coverage - unit test: - power pkg, simple to implement - cgroup - complex ones: comments + TODO - simple test cases now, refactor can come later. - system level: library dependency (how to mock them?), maybe borrow from k8s mock tests. - how to run focus test: vs code ginko plugin (huamin and sam to share the command, If/focus) - - basic e2e test cases - we deploy workload on kind cluster - validate the kepler metrics with that workload - gh uses ubuntu server, manifests with ebpf, that may cause issues with kepler (lib/modules bind mount) - build on ubuntu or run containerized mode. - dind vs VM on GH action: the flow of creating Fedora derived OS on GH VMs. Kind is a dind. The limitation of kind? eBPF/bcc library dependency. Sam please create issues 2. Questions `20min` :::info - **Date:** - Nov. 1, 2022 - **Agenda** 1. Walk through project board `40min` - Release criteria: urgent + high priority tasks done - Size - Size is used to determine the development time and deadline of the task - XL: 1+ months - L: 2 weeks - 1 month - M: 1-2 weeks - S: < 1 week - Early PR is recommended. - If the anticipated deadline goes beyond the release date, the priority of task is lowered and may be moved to next release. 2. Logistics `10min` - release tracking - biweekly meeting (for dev) - release date - Mid Dec (tentative, Dec 16th) - release captain - tracking the tasks and PRs, create tags for the issues that have release, priority, size (Parul) - tags that can be reused (Marcelo) - document everything release captain does so the process can be reused (Sally) - manage PR merge 3. Development process `10min` - task -> issue -> design -> PR -> test -> doc - only task PR before release, refactor PR will be merged after release - when merge conflicts exist, high priority PR and small PR are merged first - feature PRs must have test cases (i.e. do not drop test coverage) - bug fixes always have high priority - **Participants:** - Huamin Chen - Chen Wang - Parul - Sally O'Malley - Sam Yuan - Kaiyi Liu - Chen Ji - Peng Hui Jiang - Marcelo Amaral - Sunyanan Choockotkaew - Ken Lu - Ruomeng Hao ::: note - version scheme: incremental integer, decimal, periodical release - milestone definition: - support all clouds - accurate of power measurement - support all HW (x86, arm, s390x) - backlog project to track new ideas that not covered in current release ## Walk through project board :dart: Goal --- - e2e integration - owner: Huamin (also include e2e test) - with also include operator for deployment - also test API (maybe with mock data and long run test with read data) - (GPU test will be at risk) - platform: cpu architecture (e.g. icelake), linux distro (issues in 6.2 kernel) (Ken) - test coverage - owner: Sally - platform: cpu architecture (e.g. icelake), linux distro (issues 6.2 kernel) (Ken) - documentation - owner: Marcelo (metrics), Parul (overall), Pang (model server/estimator) - basic feature - owner: Chen Wang - deployment - owner: Peng Hui Jiang (helm) and Parul and Pang (Operator) :books: Backlog --- - Features - Testing - Documentation - Deployment :closed_book: Tasks -- ==Importance== (Urgent - Low) / Name / **Size** (Small - X Large) ### Feature - PRs to review - Low / Detect kepler is running inside VM [PR #302](https://github.com/sustainable-computing-io/kepler/pull/302) / Small - Not needed at the moment - Medium / Use local model if no model server endpoint given [PR #384](https://github.com/sustainable-computing-io/kepler/pull/384) / Medium - Urgent / power: switch to model based estimator when the RAPL interface is not available [PR #388](https://github.com/sustainable-computing-io/kepler/pull/388) / Small - Merged - Issues to prioritize - Urgent / Difference between kepler-model-server and kepler estimator [issue #375](https://github.com/sustainable-computing-io/kepler/issues/375) / X Large - High / Kepler general energy metrics vs energy tracing metrics [issue #365](https://github.com/sustainable-computing-io/kepler/issues/365) / X Large - New metrics need to propose a new design if it involves new power source - Need general enhancement template for new proposals (will follow up on slack) - Low / Kepler on VM with Hardware Counters and RAPL [issue #367](https://github.com/sustainable-computing-io/kepler/issues/367) / X Large - Medium / Fix BPF dependency so that Kepler can always find containers [issue #364](https://github.com/sustainable-computing-io/kepler/issues/364) - find a use case. - Urgent / end-to-end integration with model server, estimator and kepler [issue #349](https://github.com/sustainable-computing-io/kepler/issues/349) - ansible expert needed - ### Test - [ ] ### Document - [ ] ### Deployment - [ ] ## Notes <!-- Other important details discussed during the meeting can be entered here. -->

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully