Packaging Summit at PyCon US 2024

# Packaging Summit at PyCon US 2024 https://hackmd.io/@pradyunsg/pycon2024-pack-summit ## Arbitrary package metadata: the key to enabling smaller binaries (Michael Sarahan + Ethan Smith) **Slides: bit.ly/pycon-packaging-metadata-2024** ### Q&A / Discussion * Q: Clarification on the use of the word "environment" * A: Is less the physical package environment and more the way you pass that around to other people, i.e. a "lock file" * Followup: Environment is a very overloaded term, maybe we need a better semantic? Not getting down to the details of an environment? * A: Talking about the spec, how do you provide instructions to an installer to install package. * Q: Re: tags, if you don't want to put the burden on PyPA maintainers, do you want an approval process or is it anything goes? * A: Wrapped up in implementation of what purpose the tag serves. If there is a chance for confusion needs to be more centrall managed, would hope for a consortin of people who rely on that feature to come together and help manage that collectin. * A: Our goal is to support ecosystems like MPI that are ever evolving, so mechanism needs to be flexible to adapt to those ecosystems * Q: The way env markers work, lockfile could be totally different depending on what hardware you're on, so without restricting via specification very hard to tell if two platforms/targets can use the same lockfile * A: What's what I was talking about re: HW implementations, so idea: come up with a stub that is as specific as can be, but then CUDA stuff doesn't show up in lockfile and the stub would be resonsible for "rehydrating" the CUDA-level requirements * Q: But in stub, could't it have a conflicting requirement? * A: I'd say no, there wouldn't be a conflict but the stub might just try to install stuff that's not compatible and the environment creation tool would fail * Q: Well maybe rather than failing, maybe it should trigger a new dependency resolution * A: Excited to mess that up some day * Q: Is this about arbitrary metadata or rather a bounded set of things we want to capture? How important is it really that whatever we come up with work with arbitrary metadata and be forward compatible vs. just solving this use case (like manylinux) * A: I err on the side of arbitrary because you don't know what's going to come in, and don't think its much harder to support arbitrary metadata * Q: One difference with conda tho is that conda is centrally managed whereas here there needs to be coordination between tags and installers, and if you're talking about environment markers too, this would allow arbitrary values of environment markers * A: So maybe we should start with a limited subset and see how far we get. * Q: Question about rehydration, you're talking about CUDA versions but does this include CPU-only version too? * A: The idea actually goes way back, idea from Alyssa Coghlan in 2013 about using it for NumPy, so being open to a lot of different tags is useful * Q: So suggesting a stub with holes that can be filled in with different tags during hydration that can work or can fail * A: yes, it can fail and is not a given, need environment specs to truly capture the environment and need a record of what it was, even if it doesn't work on your machine * Q: Quansight does a lot of installs on both Conda and PyPI, currently have the concepts of a spec file and also a lock file, but those are not enough because a lock file assumes an identical machine which cannot easily be modified, whereas the spec file is under constrained. E.g. in my lockfile there may only be a few dependencies I actually care about whereas others I don't mind rehydrating, so feel like there needs to be something to capture the relationship between the spec file and the lock file. * Q: I wonder if its helpful to seperate package resolution + metadata part from the downloading the bits and sticking them in the right place part, and if there's an opportunity to think about deferring the download and install of the bits, e.g. if I know I have a machine I want to install on, can we resolve that whole hierachy and then only download and install the bits I actually need * A: In my mind you've described PEP 658, and the only thing I would add is it went to a "sharded conda" model where every package has its own "pacage.json", and the one missing step is the aggregation step. * Q: Is this a problem in other langauges? How has Nvidia solved it there? * A: I've only worked with Python and C/C++, with the latter people are happy to download it directly from us, no PyPI * A: Most other languages don't ship binary libraries, e.g. in Rust you just build from source every time so its not as big of an issue, so Python somewhat unique in that respect, and for things like Debian there's just one package that gets built * Q: So would you say Rust doens't have to handle the shipping of CUDA because it can assume the system has CUDA on it? * A: So I think most unofficial bindings expect you to have CUDA install, and then you just link to a system CUDA * Q: This is driven by the desire to support GPU problems, is this actually useful for other problem spaces or is this just going to add complexity for GPUs? * A: Main person who pushes for this on Discourse is Oscar Benjamin who's been pushing for this for years who does stuff like high level math, who needs this for MPI. Most interest driven by CUDA but not all * A: Two dimensions to this, software compatibility story and the hardware compatiblity story * A: For libraries, either have to assume every single system has it, or embed it in every single wheel. Would be nice to have an in between * A: Would like to add there is in fact a PEP to address this * A: When you talk about a system package manager you typically need admin rights, whereas a benefit to Python is you don't have to have that, like when working on government clusters * Q: Have you looked at Spack and how they've solved this problem? * Q: Can you get everyone to use Spack? * A: No, but seems like it could help inform your impelemtnation * A: Will be monitoring how they solve it, but despite the fact that Conda seems to have solved this, have to meet people where they are with PyPI * Q: As a user, they don't know they have to use Conda for PyTorch, eg.. * A: Well that's PyTorch's problem? * I think that's really the community's problem * A: In HPC environments the org is basically handing you a pre build environment, and developers are coming fresh to an environment with nothing * Just want to thank you for bringing this up since for users, knowing how this works is non-trivial. So my question is how can we build a bridge between Nvidia and PyPA? * I think we can defintely collaborate, Nvidia recently hired Barry Warsaw to work on PyPI and PyPA projects to help contribute * But Nvidia cannot promise to donate 1 FTE on this to the end of time * That's why I'm worried * Q: Talking about user stories, every time I meet with companies we talk about PyTorch, the wheels are so big, its so complicated and we want to maintain both a CPU and GPU docker images, very common for many companies. Personally happy to help with user stories and want to help with that side of things. ## What even is Python "packaging"? (Pradyun Gedam) * Have a lot of people here interested in Python pacakge but we can't all agree on what it means * Some folks thing it should be just the tools that take a package and dump it on a machine * Others think it should solve a much broader problem of handling the whole ecosytem ### Q&A / Discussion * I'm at Astral and work with uv, and been thinking a lot about this. * Core feature is just installing packages, but people also expect a lot else * Managing Python installs * Toolchain management * Managing venvs on the machine * Environment management * Adding deps to Pyproject.toml * Project management * Been writing up a doc on how to group the into ways people use packaging into two big buckets * One hand: producers to package code to then distribute * On the other: Consumers who are consuming packages others wrote in different ways * For the past decade, have hyper focused on the tooling and missed the ways we can make the UX better * I see now with pixie, uv, conda, pip, from a user standpoint there's gotta be some convergence. I can tell you time and time again, UX on Windows v.s Mac vs. Linux results in backing educator into a corner * Educators don't want to have to know how to install Python, install packages, set up environments on every platform * Until we focus on those usecase, think Python will contine to be the laughingstock of the tech community in terms of being way too hard to get started * Carol's framing of producers and consumers: https://pysplash.github.io/dog-paddle/ * My personal feel about this is that the story of how to get Python is completely decupled from how to get pPython packages * Feeling that the core team doesn't really care that much about how people get Python, provide Windows and Mac installers and everyone else is basically on their own * Don't know if there's a good solution for this, but really wish tehre was a better story on this * I think really focusing on the actors in this story is the important thing we need to spend time on * I.e. people who are building componets that are meant to be part of a larger thing. Do this cause its fun or scratches itch or like to share that with the world * Others who are pulling in and integrating lots of those little componets * Their consumers in turn don't really care that Python is involved at all * Then you have the redistributors who are going to solve this problem by giving you a whole distribution, e.g. a Linux distro * Everyone of these thinks of packaging in a different way * I think trying to think about it from the POV of the users is the key step we need to make, right set of tools and right experiance for each of those actors * Couple of anecdotes here, I have a 15 yo daughter who wanted to learn games in Python. So I said I'd help her get her packages set up so she could do it. Took me an hour to get a working system and be able to use it from VSCode, and I have many many years of experiance in this, had to try three or four different ways before it worked. * Gave a talk at PyData New York where I shared my best practices of this, but I have years of experiance dealing with problems * What's the first step of any package's readme? Pip install or conda install. But it might break the user's current environment * We've all been using environments forever, but there's never what we tell new data sciences, we just asay pip install or conda install * Other thing is we know the best practices, we need tooling that implements that on the backend and not burden users with the cognitive load of understanding all the other concepts in order to have a stable environment * If others saw Leah's talk earlier, it emphasizes the concept of having too many options and too many optinos, and its just too hard for overwhealmed new users * Having been around with pip for a while and then thrown into Conda, I saw firsthand how it was such a problem for users, where people were mostly focused on technical standards rather than on users * We need to find a place to solve this problem, but think PyPA might not be that place as its unfair to volunteers * See doc linked above, but wondering if the consumer side of packaging needs to be addressed outside of PyPA and the tooling. * We have some best practices but they don't go far enough * We have install Python, don't use system Python, then don't develop in your global space but use an environment, then need to activate environment (depending on which tool), then need to add or install packages * Those are the basic fundamental steps, but as a new user I need a very simple experiance * Then have people who are coding every day but not going to distribute a package * Then different if trying out somebody else's project, if you clone their repo and try to build that * Can tell you working as VP of Engineering with a SAAS product that without GPUs its already hard, but with GPU its a nightmare * You hinted at two dimensions, environment activation and ??? * Really seems to hint at Python import system being the problem, wit h system site packages that the user can't write to, and that's what it gives packages tools when they ask where to write to * Feel like there are two setspof problems * Those that are genuinely Python's fault, that we could fix itf we could go back in time and use what we've learned from other ecosystems * And there are the problems like system Python, GPUs, binary packages, etc., Python is so popular because its a language people use widely that it has a lot more problems it has to deal with * In the interactive case, what if we had a version of Python with behavior optimized for interactive develpment, e.g. Python automatically activated the environment in your local directory, * In my team have people who've never worked on Python before, there's a right of passage where people install WSL2, break your OS, reinstall everything, then install and use the correct Python, etc. * We have a note that has a guide for our team on how to avoid that, but there's tons of old advice out there on SO and on Google that's from 2008, resulting in a lot of conflict advice and much that will break your Python or your OS * Not sure of the solution but I know its a problem * Want to be a little compassionate about how we got to where we are today, seen it over and over again other places. Ended up here because there were problems that needed to be solved at the time in the past, but the truth is the universe always changes so we're in a different place now. * For example, hands off approach of Python to the packaging story was similar to other languages at the time, but now with Cargo, e tc. pepeople have a new expectation of how their programming environment should look like and what that should be * We have an opportunity now with other working examples to really re-examine what we're doing and how we can incorperate those lessons learned to modify who we do things * Does it conineu to make sense for Python to be hands off in the packaging story? I personally don't think so, I think its time we spend more time on this * Regarding the two Pythons, what happened when Apple took Python out of Mac? Did it help anything? * Well, people have to install the latest Python now, so maybe a little * Though people also install brew, and brew has its own problems which could make it harder for them * But at least it was not going to break your OS * Though Apple also had protection again that * Going to suggest somethng that requires little to no development * When Kubernetes was brand new, ported JupyterHub to workon Kubernetes * Problem: Depending on which vendor, was completely different way to get to Kuberneties. So developed this Zero to JupyterHub guide and had to write to a ton of different vendors and platforms to provide a similar experiance, so everything else was streighforward * So what if we did a "Zero to Python" like guide that would give users the four meta steps you need to do and then give each vendor a page to put their documentation in the same place * I just want to concur with what was just said, the idea of coming back to one authoritative source of information, and I understand if the PyPA isn't ready to take that on, but someone needs to take that on. Would be happy to help with that * Have taught Python for a long time and we taught them one way to do things which made things really consistant, but had to create that ourselves. * If we can organize around documentation as a first step, then that could make a huge different right now. * This is user experiance with Python in general, so think it should be on the Python website as its fundamentally the user experiance with Python * Every time I go into a company they say "that's not what Python.org tells me to do" and I need to convince them that I know what I'm talking about ## How Can The PSF Best Support Python Packaging? (Deb Nicholson) * Am executive director of the Python software foundation * Despite the difficulties, people are still downloading an enormous amount of packages, so we want to make that easier * Things that used to be small problems are now big problems * For example, malware on PyPI, we hired Mike to help take care of that, thank you to him * Or there's a constant stream of people who lost their passwords, not a volunteer job anymore [so hired PyPI support person] * Finally, the amount of people who are interested in Python packaging have outpaced the number of people to respond to them * Two ideas to start out * Formalize the PyPA government [cheers] * Get some funding to help with package [even more cheers] * How the PSF can help * We can raise money, but need a detailed and competant roadmap for the funders to break out the checks * Many nonprofits share this thing where we depend on volunteers as well as paid staff * Volunteer work should be interesting, fun, done ins spare time and benefits from the precense of specific community member * Paid work, anything that takes 30 to 40 hours per week should be done by someone that is getting money * Repetitive, boring, not all that fun, but that person could think about how to automate that job * Anything security adjacent and critical or that creates liability * Anything that needs constant sustained attention should be paid work * So, now, how can the PSF help? * Love the goals, don't care as much how we get there ### Q&A / Discussion * Q: Think this is great, just if we're adding more paid people want to make sure we also acknowlge the volunteers who've been doing this for many years * A: Way I think about it, its not a vanity project but rather want the Python community to suceed, so want to hear from all of you doing the work * Want you to think about all the things you do like doing, and those you don't, and see how you can offload the latter to the PSF * Want to comment on process of getting money from funders * Don't have to have everything figured out to ask for money, can ask for money to figure that out * For example, Pradyun and Sumana got that to make a plan for pip dependency resolver * The military often does that when contracting * Do have to have a plan for making the plan * A: For the PSF as US nonprofit, need to fund things that are on the PSF roadmap already and align with our mission, and needs to be ratified by the community * To have successful governence, we need to have all the major voices reprisented, pip world, conda world and the rest. I would love to see some solid user research, as decisions made in the past were the best decisions at those times with those needs. If we're going to redo things now, important to first listen to what the community needs and understand what they care about, and assume things about the user that might not be the case. * A: I love that additoin * We had a lot of this work when the pip resolver stuff was happening, but one thing we identified was that we need more of this * One thing that I think needs to get done is the governance of how money gets spend, whether its the Python Packaging SC or soemthing else, as right now if two people object to something it doens't get done * So right now not much gets done * How many people would like to see a packaging council? [Nearly all hands go up] * First, we have to decide how we decide before we can decide * We have decided that, and that's the existing people who make decisions, plus the steering council, plus all the existing people in the PyPA * Who reprisents the users? * The PyPA * Not good enough * Is the propsal something that can be changed through the PEP process? * Yes * Think that the user experiance needs to be decoupled from packaging * In the last ten years packaging is very focused on the tooling the infrastructure, so the interest and the bandwidth is not there to address critical need for UX * Have not been able to organically solve this in the last decade * So if its a seperate workgroup under the PSF that gets funded to create content for the website, would like to see that happen sooner rather than later to get the new governence structure up and running and effective * What is the reasoning we don't have a Linux installer on Python.org? Lack of someone stepping up to deal with it, or something on the PSF side that is not providing support, or is there something else? * Not something on the PSF side * So the release process, may seem a little mysterious but we have release managers, and RM is responsible for Python itself plus Mac and Windows experts who consume that release and produce installers * Partly historical, leave it to the Linux distros to consume it for their own distros and package it, and partly because there is nobody who've stepped up to be the Linux expert * So wholly also on the Python side and not the PSF side * Isn't it also because Linux distros also have Python and a non opnionated installer of Python wouldn't be practical? * Yeah, I know that when I was working on Debian, Python is integral to running the Linux operating system, but system Python is a terrible place for people to use for writing their own things in Python * Linux expert on RM team would have to figure out how to give users an end user experiance of Python on Linux that would not conflict with Linux distro * I'd like to respond to this, knowing how to not break an OS is maybe not something we can ask a volunteer to do, e.g. I've borken my operating system many times, so maybe its something we could get the PSF to fund * If its a matter of connecting with the Linux community, we can pay to fly someone out here, or go to their events * At least one distro team is working on moving their own Python out of the way, so users can install their own Python packages * Fedora tried platform-python and it broke everything, so it didn't really work * Other issue is on Linux, you don't have one operating system but 20, and what we discussed today would break the policies of at least 10 of them becuase they don't want it to be outdated and seperate from the system * That's already the case with Go and Rust and people don't use the distro pacakges, so just going the same direction * I think now that distros have adopted PEP 668 telling end users not to use system Python, so I don't think it would be too controversial for Python.org to offer this, because what you're doing is not Python as part of the distro but your own independent work * Two points: Linux distros don't like that * It would mean another way of doing things * And it would mean the core devs would have sole control over doing that, and reading the room it seems people didn't want that ## Quick plugs, shout outs and lighting round * Give our new community developed tutorial - [create a pure python package using hatch a test run!](https://www.pyopensci.org/python-package-guide/tutorials/intro.html) * Talked about this last year, though about making a PEP but someone suggested just making a library, so have something right here, not quite ready for production * https://github.com/scikit-build/dynamic-metadata * This is a way for different build backends to accept generally written dynamic metadata plugins, which would have the ability to produce metadata * Three in Scikit Build core, regex to pull out version number, Fancy PyPI readme and Scikitt-Build * Purpose of dynamic metadata package is to have a central place to register these things * Provider field to say where to pull that from * Hasn't been finalized yet so feel free to offer feedback * Have a discussion section on how authors should do it * Issue where one plugin needs to depend on another plugin, e.g. fancy readme needs the version number from somewhere * Should we keep it simple or should we add support for this, or go for full topological sorting * We build a package resolver and installer called uv * Goal is to have high compatibility with pip but also an opportunity to do things differently * Biggest way we deviate from pip is that uv is venv first, if venv isn't activated we activate it, and if it isn't aviated throw an error, and if you install into system Python you have to actively opt out * Just curious for us since we have an opportunity to take hard stances on things, e.g. dont' support `--user` * Not deely steeped in the history so want to know what things we shouldn't do vs. how we should be trying to change user expectations on how things work in Python packaging * Want to give a quick demo of Conda-Store * Idea behind it is that end user should not care how things should be installed under the hood (e.g. pip vs conda) * Want it to be that I pick a package and behind the scenes it does whatever the best way to get that package for you is * In the backend, build a lockfile, build a Dockerfile, build a Cnda image, so that all artifacts are automatically generated so when people need to care about them, the artifacts exist * When I add a package, automatically added. When need to roll back, can roll back * Additionally have nomspacing so can seperate pip vs. conda vs. corperate packages * Jaime working on pip integration so we can mix pip and conda pacakges relatively safetly * Give me feedback on how we can do this better * Two quick things * Run several meetups in the SF Bay area * Do project night where reps of FOSS projects can recruit contributors and host sprints * If want to participate, contact me! * PyBay happening September 21st so if have packaging talk, please come talk about these issues! * Want share [pixi](https://github.com/prefix-dev/pixi), have a similar package manager project focused on the conda ecosystem * Installs Python for you in a conda environment * Can also install PyPI packages in there using uv * Can mix Conda and uv in an environment that actually solves * Have examples, documentation, etc. * Give it a go and would love your feedback * Had a different PyPI integration called rip, which we abandonded in favor of uv * Focus on project management, can use pixie.toml and also supports pyproject.toml * Can also do C++ projects, Fortran projects, etc. with pixie * CI/CD environment and maintainer * Know how difficult it is for volutneers to keep their projects running, without any thanks and lots of complaints * First of all I want to apologize for the rough words that happened in the past * But want to thank pip team and others for all the packaging PEPs and others that the team has implemented * To make everybody's life a little bit sweeter, brough some sweets from Poland * [jaraco] Looking to get in to the packaging challengs and apply the lessons I've learned working on monorepos at enterprises * https://github.com/coherent-oss/system * At these companies most of the ancillary concerns are handled by tooling and infrastructure, devs only need to work on code * Wnat to do something similar for individual developers * Currently the tool infers a lot of the package metadata based on other signals in the repo, e.g. authors, Python versions supported, etc. * Worked on it for a couple of weeks and be on the lookout for news on it * Two things * One, on my GitHub, project [PyApp](https://github.com/ofek/pyapp)? allows you to ship Python requirements as a single project * Two, talk about Brett's lockfile PEP after ## Breakout sessions & Unconference ### Governance discussion <details><summary>Unstructured notes</summary> Carol: different eco-systems that need to be addressed 1. PyPI -- where the packages live 2. A better story in supporting GPUs 3. Packaging tools -- lots of them 4. The brand new Pythonista experience Some thoughts New languages are really easy for people, so more pressure to make Python easy Seeing more convergence instead of divergence in software generally Deb: The Python website should be great for newcomers Phebe: We've invested a lot into the tooling But almost nothing in the documentation We should be giving that work respect, responsibility and funding We need to cultivate that respect Tania: everything is tool-centric. Mike: Personas, "This is for you, newcomer" Could the PSF hire a full-time technical writer? Carol: "pathways" beginners, and data scientists, different pathways Mike: will they find what they need at Conda? (Carol thinks no) Conda is a whole ecosystem Pradyun: we need to set up the lay of the land Phebe: frustrated with the lack of a clear communications channel We need both the legacy and the "board" Rust's "This week in Rust" is an example of a great resource Deb: PSF's bubble should be getting "sticky" for newcomers and casual users Barry: Steering council has been bad at transparency Should be better soon Deb: PSF can handle managing writers/community comms/community relations people Jannis: "cabal" Let's not get too into technical writer idea, and not into anyone making decisions Carol: Wants to add "folks coming from other languages" to the PSF's bubble The scope is something we need to decide Phebe: to build trust, we need to make a long-term commitment instead of a lot of six-month experiments, facilitation role Mike: It's been challenging to "pick a winner" We should be opinionated about where you start and still have a lot of things Can we all acknowledge that no solution will accommodate 100% of use cases? When we fight back on all the edge cases and it makes us stop moving forward Barry: past the time to put our money on a horse... let's not stifle innovation "standard" path plus support for other options The experience of using the toolset Personas; actual beginners, part-timers, tool builders that support those scientists, (people doing glue work?) Andy: newcomers is where I want to focus too, have things work for 1000 GPUs Small shops are being cut out of the pathways Carol: we'll never find the "one tool to rule them all" pip up, pie up, or conda up, but we could make the commands a little more standardized Pradyun: the king is the maintainer and we're talking about ways to take away thier control and people will be mad, like taking a teddy bear from a kid Phebe: the packaging council would need to have the respect of teddy bear holders and making that communication piece centralized Pradyun: the missing piece is "how to establish the authority" has to be someone everyone would listen to Peter: when decisions are made without the full dimensionality of the problem. (five dimensions) My proposal is if we converge on a view of the problem My dimensions interfaces are not the same as packages or run-times, We have to start further towards the beginning "what the heck is a package?" have to treat C dependencies as a first class concern The build system is very important for this concern Exploding problem of low-skilled open source contributors, coming from Excel, etc There is no standard Python application, and that is different again from distributions Carol: we have a responsibility to mitigate the pain, now Mike: we split out the build front end and back end Shauna: a lot of the things reflect the people and perspective we'd need on a steering council except the user experience perspective. We can get people to give us their teddy bears, if we make them feel respected and part of the whole picture Barry: it has gotten a lot better, honor all the work that has gone before We can evolve, and get to a better place. Make sure the teddy bear owners have a voice in the bear's future. Governance, the model for that.. we have a steering council that delegates work to other entities, eg Typing council if we made a "packaging council" Carol: the focus is on tech at the expense of the user Rust doing a great job, Barry: Rust has been really good in many ways They've put their weight beyond Cargo Andy: The response is two fingers, new topic is one finger Pradyun: the typing model will not translate they are very much not independent, packaging is all interoperability CPython says This is how the implementation will look Packaging is always about use cases We need functional decision making and I think we need a council to make that happen Let's just take the first step Jannis: it took years of effort to make Cargo user-friendly, with designers and researchers We should look at user-expereince/success/happiness no one will lose their face Pradyun: Cargo is at the same level as the language in importance at Rust Barry: The PSF should fund that user experience research Phebe: The packaging council could have sub-committees on "operating systems" and "Peps" or "tools" and "user experience" Peter: Telemetrics are not a nice to have, they are 100% necessary Let's do it non-creepy Jeremiah: Is the SDK in there? Is there a place for environment managers? Pradyun: responsibility of the steering council Phebe: opinonated, but not opinionated. We need to make decisions timebound so we can actually decide things Carol: my proposal: let's have the PSF create a user-success workgroup that can hire a person Barry: Packaging Council Recurring ideas about work the PSF could fund: Communication Technical documentation User experience research Outcomes: Create a user success working group, ask Board members to flesh out and instantiate Not necessarily limited to Packaging (UPDATE: a draft charter is already in the works!) Broad agreement about a packaging council that will probably need to set up some sub-committees to focus on things like operating systems, or tools </details> ### Symlinks discussion 1. Should we do this Symlinks are useful for shipping versioned binary libraries (e.g. libcudnn.so.9) with specific version info (e.g. libcudnn.so.9.1.1) and potentially editable installs. 2. What shouldn't we do - Unrestricted links to arbitrary locations on the filesystem (no symlink to /etc/shadow) - Source should never be in a different package, always inside the source package 3. What will we do - `LINKS` file? Could be in the `RECORD` file. Preferable to simply stuffing POSIX-y symlinks into the wheel; e.g., would potentially allow for [NTFS support](https://learn.microsoft.com/en-us/windows/win32/fileio/hard-links-and-junctions), and portable wheels (e.g. how do you deal with multiple platforms? New platforms?) - Tools should resolve the paths to check target paths - Tools should check for dangling links and error if at the very end there are dangling links? - Tools should/may check the hash of the targets - Windows behavior should be explicit, and Linux with FS without symlinks. - What paths should we accept? - At least within the same namespace(s) in the wheel - What to do about the data directory? - Should/may check for cycles or long symlink chains. - Symlinks should be regular files. 4. Unsure Distribution A packages can have links targeting B packages iff it depends on B? Do we need this, and is it possible to do this performantly. ### "Making Python Packaging Easier" discussion #### Experience of the attendees: * It’s easy to be mad, but it’s difficult to remember what exactly went wrong * Way more difficult to find information than it is to do the actual packaging parts * Lots of self-teaching about how to package; there are different tools for it (hatch, conda, pyproject.toml + wheel, etc.) and the information doesn’t authoritatively seem to be in one place * Lots of teaching and unteaching people how to do things * Monorepos! * Dealing with 3,000 contributors, so easy-to-install packaging is incredibly important! * Open source libraries with some of them being maintained but some not, using tools like poetry; what’s the “best” way to do this stuff? * Maintaining 2 - 3 mono repos * Created an organization called PyOpenSci to teach scientists and other folks how to use Python and trying to install Python, getting started, making Python projects, etc. * Introduced to conda in 2019 during internship and when they were not able to install Tensorflow; is very interested in talking to the community etc. and notices that the topic of packaging is incredibly advanced-user-centric. Python is supposed to be an easy language to learn but JUST INSTALLING IT is so difficult that it’s almost like a blocker so “you don’t have to install Python to use Python” is almost a selling point! * Developing tools for Python developers and is focusing on user-friendliness and UX. Just wants to create simple tools. #### What is the current state of things? * Googled “packaging a Python project” and the top results was a “Packaging Projects” page on the Python website, a blog post, and a Medium article. * Why did the PyPI and Conda ecosystem get created? It was originally created as an educational teaching language. If all of your tools are in Python, all of the things in your ecosystem are supposed to work well together. However, the tools that scientists and data scientists use are very commonly written in C, C++, etc. and so there’s something called a “native binary problem”. Making this stuff compatible across the board is an incredibly challenging issue! Conda was created to resolve those binary compatibility issues. * The idea behind wheels is that you “stuff all of your binary pieces into your wheel”, so that every package thus becomes a distribution. Then you encounter the dependency problem. PyPI’s metadata doesn’t provide tools for systems to intelligently decide anything. If you’re installing things from PyPI even with better tooling like UV, the incompatibility binding problems are not solved. Compiling everything from source and making sure everything is compatible (i.e. a monorepo) is going to produce fewer errors but it’s A LOT of work. * Are wheels allowed to install into any arbitrary location? Yes, and things wouldn’t work if that weren’t allowed. Wheels are getting larger and larger because you basically are having to “ship the world” and that world is growing constantly. * PyTorch and TensorFlow usually need separate environments because they are incompatible (why? few know the details). Pip very often installs things and pulls in dependencies will break other things. Q: What kinds of errors do you get? A: Usually runtime errors. * Something to mention is that [name redacted]’s monorepo does _not_ enforce version constraints. When something is pulled into it, something goes into the metadata; when one component is upgraded, nothing is checked for compatibility. The reason this is allowed is because there is a base assumption that there’s a unit test written somewhere that checks things. 😱 * If you type “python” in the cmd or powershell terminal, a link to the Python website shows up. * Current best practices for a particular participant at the table: Start with a spec file, install things with binaries from conda-forge/conda and everything else from pip via a requirements file. The problem that has been detected with this is that there are some namespaces (for the same package) that are different on PyPI vs conda-forge. Automatically detecting the correct mapping is an incredibly manual process, but it’s currently being worked on. * “Happy path” could be to constrain what the tools are used until blocks happen, at which point introduce another tool (the initial tool that’s used would be different depending on which OS the user is using). #### How/what to fix? * Something we should focus on is potentially automating creating a virtual environment; i.e., when people want to update a package, take a snapshot of the old specs of the old environment and then create a new environment with the upgraded version of the specified package. * Is it a good idea to let people download Python from the Python website? It probably isn’t, especially on systems like Windows. All of the search results out there tell you to pip install everything, and there’s zero understanding of what a virtual environment is. * What if there’s a mode in Idle (common Python IDE) that creates a virtual environment for you? * Documentation tends to be written by the developers (who are advanced compared to someone who is a beginner) and it shouldn’t be written by just advanced people. #### Possible solutions * Educating people is a good first step; the Python website doesn’t even mention that these problems can exist. It represents Python as “batteries included”. The messaging is not quite accurate; all of the problems are solvable but it’s not necessarily going to feel “easy” for a newer Python developer. Saying something is “easy” and not feeling that way is gonna turn away a lot of potential Python people. * Some sort of webpage or UI workflow where you can select specifics like your particular OS, what task you want to do (e.g., data science). It would also be good to find out what level the user is and what tools they’re comfortable with/have used before (e.g., using the terminal vs "never used the command line before"). #### What is currently out there that can help? * [Python packaging tutorials by PyOpenSci](https://www.pyopensci.org/python-package-guide/index.html) * [`conda-pypi`](https://github.com/jaimergp/conda-pypi) (eradicates the need to move PyPI packages to conda-forge) ## Summary ### Symlinks **See above for full summary** * Talked about adding symlinks to wheels and how that might be useful * Discussed adding a new metadata file in wheels called pairs to encode symlinks * Would avoid platform specific aspects of symlinks * Also discussed where symlinks should be allowed to have their soure and target paths * E.g. not /etc/shadow * In the package directory * Should we allow interpackage links? Open question * Discussed whether and what checks to employ for these links ### User experience and documentation **See above for full notes** * Discussed our experiances and opinions which are extremely varied * Goes all the way back to installing Python * Two possible solutaions * First, educating people * Python Website not nessesarily accurate and complete in terms of presenting the specifics of installing and using Python and its packages * Would be nice to have a website to ask users a few questions and give the user recommendations tailored specifically to their use case ### Governance **See above for full notes, hidden in foldout** * A bunch of ideas of work the PSF could support * Pople to help with documentation, UX research, etc * Want to create a user success working group and invite people her eto participate, beyond just packaging * There was broad agreement about a packaging council * Might want to set up submocmmitees to focus on specific problems * Notes will be added here by Deb (PSF ED), after typos and some sensitive information is trimmed. ### Dynamic package metadata for GPUs, etc * Discussed proposal presented earlier * Came up with examples of these tags that might be useful * Also discussed some problems * Packaging project might be best place to add hardware detection ### Integrate wheel package into setuptools * Discussed the topic * And got testing working with the coherant build system ## Callouts for sprints people might want to attend * Nvidia people welcome people coming working on the dynamic GPU metadata proposal * Symlink support in pip/wheel * Would like to work on installing things in hatch and tool support for it * Want to work on a graphical interface to packaging tools following best practices * Will be sprinting on Conda and related projects as well as Conda-PyPI integration * PyOpenSci will be sprinting on their opinionated packaging guide * PyPI will be sprinting on improvements to Warehouse, the codebase backing PyPI.org * Remove pain points for PyPI * Improve search using ElasticSearch * Anaconda/Peter Wang wants to work on exploring the dependency structure of Python packages, email pwang@anaconda.com