CI-2024-01-31 - HackMD

--- title: 2024-01-31 parent: Meetings --- # ASWF CI Working Group Meeting: 31 January 2024 [https://zoom-lfx.platform.linuxfoundation.org/meeting/97730566101](https://zoom-lfx.platform.linuxfoundation.org/meeting/97730566101) ## Attendees * Jean-Francois Panisset, VES Technology Committee * John Mertic, Linux Foundation * Kerby Geffrard, OpenRV / Autodesk * Larry Gritz, Sony Imageworks / OIIO * Christina Tempelaar-Lietz, ILM * Larissa Fortuna, GitHub * Steve Glass * Andy Grimberg, LF RelEng * Aloys Baillet, NVIDIA * Jean-Christophe Morin, Res * Andrew Grimberg, LF Release * Ryan Bottriell, ILM ## Apologies ## New items * GitHub Actions team invited guests * Andy: presenting the ASWF, and our WG for CI. Libraries that build the DCCs, we are heavily dependent on GHA. Looking forward to GPU working properly, as well as ARM. * Larissa, Project Manager for GHA. Full suite of runners, Linux / Windows / Mac, Steve specializes on Mac. Working on networking, as well as bringing images. We can answer questions, work through roadmap. No specific presentation. * Roadmap: Larissa * Hosted runners * ARM and GPU runners * GPU: single Tesla T4 SKU, hearing that it's not big enough for some workloads * Looking to offer an A100, also looking at NS V3 series from Azure, can offer multiple GPUs, and has more VRAM (also feedback that current SKU doesn't have enough VRAM). In this coming year, not sure capacity wise how many GPUs we'll be able to get from Azure. But GPU capacity is always at a premium. * For ARM runners, we have the Mac M1, these are in public beta, avaialble now. Blog post went out this week, and available for public repos. Going into private beta for Linux and Windows ARM runners, we should be on the beta list for that. * We plan on eventually offer this for public repos, but for private beta it will be paid offering only. Some restrictions with image we're putting on these machines, not sure how to offer full set of tools, not a lot of tools. * Larissa: are we using container jobs? Larry: we just specify the container name. Some of the projects have a lot of dependencies, will chew up a lot of minutes buiding dependencies, so pulling the Docker container saves a ton of time on CI runs. It's not just using the GHA minutes, but also turnaround time on a PR / test suite, want builds to go quickly instead of rebuilding dependencies. * Larissa: we can't run container jobs on our Linux ARM runners since we don't have Docker pre-installed, and since you can't reboot the machine, you can't run builds inside a container. Also we can't run the setuppython action, if we depend on that, that's not configured to work. So don't want to cause frustration, we will have these in public beta. We don't have to decide this now, don't want to introduce something we can't use. * Working with ARM (the company) to have them build an image that will have those tools pre-installed, on roadmap for GA. * Working on custom images, ability to build your own image so you don't have to worry about this. Proof of concept will be in the workflow file, a decorator that will say "snapshot" to be able to snapshot the machine image, will upload to your GitHub account, and be able to run jobs against that image. Will have all those dependencies pre-installed. Will speed up jobs, no need for the "kitchen sink" we currently install on runners. Targeted alpha around March, public / private beta around summer time. * Andy: for custom images, does that mean we have to define custom runners for those? Larissa: yes, initially they will only run on the larger runners (for cost). We are in the planning phase of that. * Mike: the image have to be generated within the Action workflow, right? Can't just created a VHD yourself? Larissa: essentially we are creating a VHD, but you won't be able to create and upload it, we think it will be easier to create within the workflow file. Previously we had an alpha that supported image upload, but it didn't work very well. * Larissa: we can get you guys signed up / enabled when the alpha is enabled so we can test and enable. We are always looking for good feedback. * Larissa: lots of networking stuff coming up. Steve: just started owning the networking feature recently. Azure VNet is in beta, will GA end of March. Also exploring cloud agnostic via VPN gateway, can create a tunnel to your VPC for instance if you needed to access your own resources. Also adding network monitoring logs, and ability to control outbound network traffic. This is pre-alpha, but hopefully will be able to visualize outbound traffic and where it's coming from. * JF: getting a lot of all accessed network resources at the end of a public build would be helpful for people who want to build on-prem. * Andy: first I hear about VPN, brought to mind some other LF projects that are currently on VEXhost, part of their workflow can't shift to GitHub. Would this VNet would allow runners to run on a specific network with VPN access to another network? Steve: yes, would allow access to AWS / GCP and on-prem. Andy: what kind of VPN would this be? Steve: likely OpenVPN. Andy: looking at ONAP and OpenDaylight foundations. They have large jobs that need to run on multiple instances and need to test OpenStack, but we don't allow public access to these networks. Trying to move off our Jenkins system, and this sounds like this would allow it. Steve: will reach out to you, will be an early access repo, will get instructions on how to use. * Steve: also looking at "dials" for organizations to allow organizations to run longer workloads. Andy: some of our projects have weeklong jobs * Larry: on the topic of GPU runners, do you have plans to make those available for public projects? I've just changed a project to use a M1 runner primarily so contributors to project can run those actions before submitting a PR, but we're going to be in the same boat for GPUs. ASWF can pay for GPU runners for PRs, but leaves contributors without access until they are ready to submit, not able to run themselves. They could be low powered / metered, anything that helps public contributions would help. Larissa: It's a good question, understand why you are asking. Don't have plans to open up GPU to public for obvious reasons: potential for abuse, but also lack of capacity we can get from Azure. So we want to provide to a smaller subset of customers so we can provide capacity efficiently. * Larry: whenever we use things with paid runners, we have to split workflows in funny ways, would be handy if a workflow could skip running tests if a GPU runner isn't available so it's not a failure. Or maybe a "fallback" runner, would make workflows simpler. Larissa: makes sense, taking this feedback. * JF: we mostly use a single GPU, small. Larissa: have we been able to try current GPU offering? Larry: have been trying on and off for a year, my project use of GPU is very particular, mostly doing it through CUDA and OptiX. Had been having problems with getting the right version of NVIDIA driver / CUDA version, haven't gotten it quite working yet. Overdue to see how far we can get with what we have, but it hasn't been an easy road. We don't don't use it in a typical way. Larissa: once you have time, let me know any feedback you have. Larry: don't worry about it too much yet, will reactivate where I was in this process, and figure out what's the next stumbling block and provide better feedback. Larissa: we closed previous alpha, we closed that and made a new alpha with a newer runner. Hopfully this will fix issues. * Andy: have created new GPU runners, that was shared at previous meeting. Also can now see all runners available if you are a repository admin, new functionality in GitHub. * Mike: for custom images, do those have a required base image, or can we use any Linux distribution as a base. Larissa: initially we will provide a base Ubuntu or Windows image that you can build off. But eventually hopefully be able to build custom images on top of other custom images. We haven't worked out all the details so its further out on the roadmap. What sort of distro? Mike: our industry has a reference platform hinges on the Red Hat ecosystem, RHEL or downstream distros such as Rocky, Alma. Ubuntu in our environment depends on the studio, but not used as a build time environment. So usually a RHEL-like environment. Larissa: yes, should be possible as long as a Linux distro, eventually. * Mike: for models you mention, you said A100 which enable the vGPU splicing / "MIG" separation. Would you consider to use this to chop up GPU into multiple GPUs, or one job gets one GPU. Larissa: for security reasons would be one job gets one GPU, at least initially. * Mike: for networking components, would this enable a project to allow organizations to supply their own internal runners that can be defined at project / org levels, allowing their own employees to submit PRs that only run on internal runners? Larissa: we have ability to bring your own self-hosted runners, where you hook up to your org and run on your own runners. Mike: more being able to define specific runners to specific contributors, want to provide your own runners but not make those publicly available. Andy: are you asking for the ability to have a job run on different type of systems depending on access (fallback), but also enclose it in an ACL so only if you are in this team / named user? JF: sounds like forking into a private org and hooking your own runners would work? Larry: could be separate workflows, don't know if you can do conditionals? Andy: I believe you can add conditionals based on users who triggered the runner. When I get the minutes used report, it tells me what user instantiated the job, so an if clause in the workflow. * Larissa: so not included in the networking stuff, would need to be a feature request. * JF: anything more on the Mac side? Larissa: M2s are on the roadmap, working on acquiring the capacity. Also the different images, macOS 14 is in beta (or maybe GA). macOS 15 will be next on the roadmap. Next will be security and to bring them into runner groups with Linux and Windows, evening the playing field. We want to bring them to feature parity. Larry: I'm not sure I know what you mean by "runner groups"? Larissa: runner groups only exist for paid runners, they are how you control permissions for paid runners. Macs run entirely on labels for now, but we want to be able to admin them the same as Linux and Windows. Also advanced network features will be on Windows and Linux only, but want to be able to offer those on Mac as well. We rack our own Macs, whereas we do Windows / Linux on Azure infrastructure. The Mac infrastructure is GitHub specific, not part of Azure. * Larry: latest Actions status page is a great improvement. Before GitHub Actions when I was using Travis, I found handy when you have a bunch of jobs running, sometimes there may be a specific job that's the one you are testing, if there was a way to specific parts of a job without killing the whole job, that would be helpful to limit resource usage / speed up. Larissa: be able to re-run specific jobs? Larry: we can do this now, rerun specific parts, but want to be able to kill specific parts of a job. Sometimes you fixed a bug that's only specific to a component of a whole job. Would be handy, and would free up those machines for other users. * Larry: some of us monitor a lot of these busy projects, if you watch a project, you get a LOT of email. Would be great if you could get this as a daily digest with links, that would help limit the number of emails, a lot less cognitive load, only alternative is to stop watching a project. Larissa: I understand getting flooded with notifications! Will take this feedback back to the right team. * Larry: would also be great if you could just get the initial email for a PR / discussion so you can opt in to whether you want to follow a PR / discussion, you can unsubscribe to a specific discussion, but would be great if you could opt-in to a specific discussion instead of having to opt out. * GitHub Actions runners updates * [default free Linux runners now have 4 cores / 16GB RAM / 150GB SSD](https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/) * Is this enough for most workloads / are there CI jobs that can shift back to free runner? * Should we remove our custom ubuntu-20.04-4c-16g-150h runner definition used by OSL? Andy: removed that, took a look at last 60 days of usage, 2 uses from Larry, who confirmed it wasn't needed. * [M1 free runners with macOS 14](https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source/) * This should be a big help to limit costs * [New support for viewing runners available to a repo](https://github.blog/changelog/2024-01-17-github-actions-repository-actions-runners-list-is-now-generally-available/) * aswf-docker updates * No movement on 2023.3 * Now that 2024 commits have gone in, not sure how 2023.3 could be released from a branch? * 2024 progress * [Draft PR 186](https://github.com/AcademySoftwareFoundation/aswf-docker/pull/186) has gone in * Switched to LLVM 16.0.4 / 17.0.1, thanks to Larry's suggestion * Need upcoming OSL release (this week) for LLVM 16/17 support * Worked through some Conan / CMake issues, thanks for [CMake and Conan: past, present and future](https://youtu.be/s0q6s5XzIrA?feature=shared) video posted by JC * Got past issues with missing PySide6 binaries wrapper, thanks Nick * Had to disable test_package build for pybind11, qt and pyside until after initial release * Will create GH issues with remaining TODOs for a 2024.1 release * Permission issue with GitHub API creating releases of un-merged branches / SHA values? May need to merge PR to be able to create releases. * Andy: not best practice to create a release before merging code. For some personal projects, have a release drafter, on merges of changes, adds notes to a draft release, so a bit different, gets a new "draft release", only on merges, not PRs. When I tag code, it creates the assets for me, then manually release. But not what you're trying to do. For LF projects, we generate release notes on tag along with assets all at the same time, when we tag. JF: maybe there are logs? Andy: recently GitHub did some work around some CVEs, don't think it's related, but there were some permission concerns. * Can I get volunteers to review PR? Will let you know when Draft status is removed and PR is ready to merge * Hoping to have initial release by the weekend * OpenEXR issues building Python wheels on Windows ## Follow Ups * Automated release announcement to Slack channel not working? * May only work for releases from the main branch? * OpenSFF Badging Requirements * TAC discussions on the requirement for Gold level for (new) projects to reach Accepted Stage * [Current project status](https://docs.google.com/spreadsheets/d/1n8xEdbJ77fVk5YxtuqjC7KZywi0W7ZfXlGf0YjVZI9Q/edit?usp=sharing) * Two PRs to look at: * [Adjust requirements for the OpenSSF Badge at the Adopted Stage](https://github.com/AcademySoftwareFoundation/tac/pull/556) * [Update the Best Practices Page with spots for detailed instructions on how to fulfill each requirement](https://github.com/AcademySoftwareFoundation/tac/pull/557) ## Tools and Links * Snyk support for C++ scanning [testing in OpenEXR](https://github.com/AcademySoftwareFoundation/openexr/pull/1608) * [Security risks of running your own GHA runners](https://johnstawinski.com/2024/01/11/playing-with-fire-how-we-executed-a-critical-supply-chain-attack-on-pytorch/) * [Tart virtualization toolset for macOS and Linux VMs](https://tart.run/) * [Cirrus CI system](https://cirrus-ci.org/) * [GHA Log Improvements](https://github.com/orgs/community/discussions/89879) * [Ubicloud - Cheaper GHA Runners](https://www.ubicloud.com/use-cases/github-actions) * Still relevant given improvements in free runners for us?

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.