Python Packaging Summit 2023 Notes

Short Link to Notes:
https://tinyurl.com/pypack2023

Session 1: Scope, strategy and synergy

PyPA & Conda: Past, present, future

Jannis Leidel

Slides: https://docs.google.com/presentation/d/1zr4R-UFoty3QNU3u1c5f_OgBq2JTBSSSwEcdTr116mE/edit?usp=sharing

Jannis: Introduction:
- Started with the PEP that added wheel support about 10 years ago
- Involved with Conda now:
  - Conda is a "full stack" package + environment manager.
  - More like Homebrew or Nix than pip
  - Packages are for more than simply Python projects
  - Big ecosystem with many subcommunities that are packaging for their particular niche
History of PyPA and Conda
- Separate but evolving along parallel paths
  - Solving similar problems
- Sometimes implementing solutions that fail, but trying again.
- In 2020, both projects created governance policies and improved collaboration

Open Questions

How do we tell end users when to choose which?
Which PEPs should conda be following?
- Should Conda follow any PEPs?
  - In the past, Conda has cherry-picked ones that they thought might apply and not followed others. But fear that there isn't a good set of criteria.
How could a delightful integration between PyPA and conda look like?
- Brett:
  - Where do the differences lie?
  - What gaps exist in one that the other one could plug?
  - Can we bring features over from one to the other?
  - How can we avoid maintainers having to duplicate work between them?
- Jannis:
  - One key difference is the YAML files that contain the packaging metadata for Conda is maintained in a different place as the actual source files.
  - This is more similar to Linux distro packaging than how Python packages from pip are managed
- Brett
  - That's a great point, as the approach is different. One treats projects as producers of source code (conda), while on the other side the same group produces both (PyPA)
  - Does this scale to a PyPA/pip size or is there a way we can piggyback?
- Jannis
  - We ask a lot of individual project mainatiners to do packaging themselves (on PyPA) and that creates a lot of pain
  - Is there a Conda-Forge like thing that can work in the PyPA world?
  - Conda-Forge is not without its flaws though
- Bernat
  - Isn't the core of the problem is that PyPI only wants to do the core Python depencies whereas Conda tires to do binary dependencies generally?
  - Maybe we can standaridze the project metadata so that Conda can just add its metadata on top of that in pyproject.toml
- Jannis
  - Want to jump back a little bit; Conda-Forge emerged in 2015 and pyproject.toml came out a few years later
  - This stuff might not be designed well and could be a mess, so we need to think about how we can improve it
  - To your point about binary/full stack dependencies
- Toshio
  - Earlier Brett asked if Conda-style packaging can scale to upstream Python projects
  - Coming from a distro world I can say that it is possible, but it is very labor intensive
  - The key is automation, e.g. someone made a script to automatically take everything off PyPI and create RPMs out of it
  - Not perfect but generally works really well
- Cheng
  - conda/conda-forge maintainers
  - When we're talking about Conda and Python, easier to think about how Conda interacts with Python packaging community standards
  - From the Conda side gotta think of it more like RPM or Homebrew than PyPI
    - Packaging much more than Python: R, Julia, System libraries ©, etc
    - Has recently implemented plugins to Conda where much of the language-specific needs can live
  - We kinda consider the Python community our home but are also very cognizent that if we set standard very Python-specific, if you look at the number of packages on Conda-Forge Python is only a relatively small percentage
  - Getting Python maintainers to try to write the correct infra for Conda Build is going to be a more effective model than trying to clobber every other ecosystem to fit Conda's vision of the work
- Filipe
  - I've been thinking of how to bridge this gap
  - First step to get Conda to understand wheels so they could be re-used as opposed to building a new Conda package, just modify the metadata
  - Could start with pure Python wheels and then move to more complex cases
  - Add more metadata in pyproject.toml so that Conda and other package managemenet infra can use it
  - Ralf Gommers and Pradyun have a PEP to specify via URL metadata the source ecosystem (PyPI, Rust, etc.) of a package
- Jannis
  - Looking at Daniel [Holth] a bit as well here
  - We shouldn't look at the metadata so much as to how they're structured, but how end users interact with them and how we communicate that
  - My question to Daniel: Could we see a convergence on one file format that could handle both ecosystems, and how do we handle conda being several steps away from the Python ecosystem
- Daniel
  - Third party in Conda and also in Linux distros is different than PyPI:
    - PyPI: one-to-one upstream project to (build?) but Conda one upstream could yield many builds
    - Could the conda distribution define a manywindows (manylinux-like for Windows)
- Pradyun
  - Conda is not so much like pip but more like Homebrew and friends like you said earlier
  - PyPI is both providing source distributions and also binaries, whereas Conda is building just binaries via its own stack
  - So I don't think re-using (binary) wheels would make sense as a lot of what gives Conda power is its consistent build env
- TP
  - Since Conda is closer to e.g. Homebrew and APT, and the general advice around mixing them with pip is essentially Use them for non-Python stuff, create virtual enviromnents and use pip for Python packages, maybe a better approach is to provide a way for users to leave Conda out of Python packaging?
  - In this setup we would encourage nesting virtual environments (or Conda environments that don’t have conda in them - ???) inside Conda environments containing non-Python dependencies, and use only Python-specific packaging tools instead of Conda in a nested environment to manage Python stuff.

Packaging Con Plug

https://packaging-con.org
October 26th to 28th, 2023
Berlin Germany

Could PyPA and conda collaborate on supply-chain security topics?
What pain points do you see in everyday usage?
Should we close the gaps in the conda/Python environment user experiance?

Framing the PyPA mission for maximum impact

Travis Oliphant

Reframing PyPA as the Python Packaging Association
- Instead of saying "This is how people should do things", [?] being a place for collaboration amongst those packaging Python.
- "Authority" name conveys too much, well, authority
Travis's Journey with Python
- Been invovled with Python since 1997
- From the scientific Python community where folks use Python as the second thing they do, not the primary
- SciPy was originally solving a distribution problem due to the problem of getting it installed
  - Only one way to install stuff way back then: .exes
  - This created a lot of technical debt, and the question is how you avoid that
- All about framing, if you frame it right better chance of solving the problem
- Created Numba, compiler that takes Python syntax and creates object code
  - Solves some of the same problems in a different way
  - Also introduces new problems because it compiles, therefore making platform-specific builds
- Was part of the team that created Conda in 2013
- Now founded a company Quansight that has a Labs component which has several people here
- Just to clarify, my words are my own
- Worked with Napari and had a lot of exposure to stuff, but still, there may be a lot I'm missing
There are many kinds of Python users
- Emphasize that Python users are a huge community with many different needs
- As Brett [Cannon] said, "I came for the language, but stayed for the community"
- There are a LOT of people using Python in a variety of fields
- A lot of Python users care about different priorities so one set of "expert" advice doesn't fit all
Different python users get Python in many different ways: distributions, dev environments, available with their work systems, embededed in applications and corperate environments
- For people to have a successful experiance, users need someone to take care of them and their type of problems
- If you try to solve everything one way, it will create many challenges
The PyPA should not try to support every individual user
- But it should support the integrators; the people/projects who are then supporting the indivudal users
- Python means different things to everyone, so one mesage doesn't work for everyone
- When users wonder how they install or create Python packages, they go to PyPA advice, but it may not be appropriate for their use case
- Comes up in buisness world, called "channel conflict"
- Right now, sometimes PyPA focuses on one set of users but ignores others because they unconsicously perceive them as being in conflict (Channel Conflict)
- Rebranding PyPA as the Python Packaging Association would go a long way to helping clarify and broaden this messaging
Specific harms due to current approach
- Channel conflict
- Complex AI and LLM environments become very hard for users to maintain
- Vendored wheels result in downsides of vendoring
- This results in more maintainer work
One vision for PyPA to support everyone
- Would argue that Python.org + pip is an emergent binary distribution (people getting their packages directly via pip)…
- … and that's the only group that PyPA seems to be supporting (well)…
- … but there are many other ways that people get python that we could address which make us more inclusive and stronger.
Specific suggestions
- Changing the name would help underscore that there isn't a single way to get Python
- Don't promote just one Python distribution and binary distribution mechanism/format
- Pip should be considered more of an API spec that could be overrable rather than a particular implementation
- Read Pypackaging-Native and Python Packaging: Where to next by Ralf Gommers
Jannis:
- We are mainly an association now, so we aren't that far away from this vision.
- We have a governance structure: we can propose the idea and call for a vote
Russell:
- Some users want an authority
- I'm an expert in this community and I don't know all the important answers either in the current messaging
  - E.g. Should I use setuptools or hatchling? Is one replacing the other?
Travis
- I agree with you, users want guidance, we just need to make better guidance
Sumana
- Been working on pip, PyPI, and PyPA tools previously
- Who has read/referred to UX (User experience) research on setuptools/pip? (Jannis raises hand)
  - https://pip.pypa.io/en/stable/ux_research_design/
- In the last year, the PSF has led some packaging strategy discussion. Has that been useful to anyone? (A few hands raised) https://discuss.python.org/t/packaging-vision-and-strategy-next-steps/21513
- Wanted to check whether those things have been useful
- UX research and developer experience work has helped make pip work better for users
- If we don't have UX researchers involved, we need to fix that
Pradyun
- Think it's wrong to say we don't address all of these.
- We are the source that all of these other distribution methods pull from
- What I do agree on is that there is a binary distribution purpose distinct from the source distribution purpose and seperating that might be a good idea
- The other piece is that there is a significant amount of effort put toward making sdists more interoperable with downstream workflow
- Main source of friction is the binary distribution story
- One privilaged binary distribution (wheel) isn't solving all the problems
- We also don't have a good story for Russell's point: solving the problem of which of many alternatives should be chosen
Bernat
- I think my take on that is that if Conda and the other distros don't agree with the messaging of the PyPa, maybe they should join and help drive the message
- We can have better defaults, maybe a better message is pragmatic recommendations depending on people's use cases
- Maybe we should update the documentation to explictly recommend Conda (or others) for specific niches
- Changing the name might not really solve the problem and take away some of the "authority": If we had less authority, would people stop following our advice?
Travis
- Well said
Kshitij Aranke
- I maintain a tool that is a downstream redistributor of Python
- Instead of taking the easy way out and changing the name, do the hard work and make it a proper authority
- Have read more discussions than average Python programmers, but not sure who is actually backing this
Pradyun
- One thing I will mention is that we intentionally put the strategy discussions up front so that people can continue the discussions throughout the rest of the conference

Scope and requirements for PyPA tools

Leah Wasser

https://www.pyopensci.org/blog/demystifying-python-packaging.html

Who am I
- Executive director of PyOpenSci
- Come from the scientific side of the Python community
- My background is in education in the scientific space
- Started with GIS and then moved to Python
- Building a diverse community in the Python-scientific community
Coming from the scientific perspective many of the challenges is working with maintainers to help them keep up with packaging standards and practices
- As someone newer to Python coming from R, I've seen a lot of confusion around packaging
- Sees many users looking for someone who can say "What is the right answer/tool/approach to this problem"
- Hard for PyPA to endorse a specific approach is because it is a very complex question that doesn't have a single answer
- In our guide, taking the choose your own adventure approach to helping users select the approach that works for their use case
- Point of confusion that comes up over and over again: What is a PyPA tool?
  - E.g. Hatch–is Hatch a PyPA tool? How did it end up there?
- Do tools that are in the https://github.com/pypa namespace have some special blessing that other tools do not?
- Should there be some requirements for tools under the PyPA umbrella
  - Bus factor - enough maintainers?
  - Are there criteria for which packages are located in the pypa? because outsiders assume there is something ("more stable, most featureful, official, are maintained, etc")
I think we should have a conversation around this topic so we can consider moving this forward
Pradyun
- Scope and requirements are "eh?" at the moment
- How new projects are admitted is via a vote of the PyPA members
- In general, the critiria is whether this is a low level tool that can be used in the Python packaging ecosystem to help other higher-level distributions distribute Python and packages
Jannis
- I appreciate what you just said about bus factor because this is the original sin that launched PyPA:
  - Ian was the sole developer of pip. Fixing that was the birth of PyPA

(remember that pip reinvented easy_install but said, guess nobody needs eggs (binary distros))? - dholth

Yngve
- I had to figure out how to package my first packaging on PyPI in 2018, not long alog
- It was a complete mess, read blog post after blog post and each with conflicting information
- Blogs didn't address the question should I use A vs. B
- For example, read the new guide on how to package, is amazing, but why was Hatch picked first?
- Pretty familiar with setuptools so why should I lean something new?
- There's no assurance that today's recommended tools will be the same down the road.

(just write one from scratch - dholth :) nobody likes this advice 🤔)

Filipe
- Leah did a really good job documenting that on the PyOpenSci packaging guide and we should have something like that in the PyPA documentation
Jeff
- When it comes to build infrasture, as much as we would like every Python dev to be an expert on packaging tools, this is never going to be the case
- Is just an overhead for most people until things break or they need to update something
- So its a good question to ask whether a tool will be maintained in the medium to long term and users can rely on being able to use them and it won't suddenly break or they need to switch
- For example, users need to have confidence that Hatch will be around in 5 years and users can rely on it for the long haul
- Its really expensive to change out build tools and switch to a different one
- This is part of what lets Rust be much more opinionated about Cargo
- Vs. Python sees such wide use across many different distribution methods results in packaging being very different
- Anything we do here and officially endorse should have some legs on it and live for a long time
Bernat
- I think this goes back to open-source maintainers maintaining these tools, so things are going to inevitably be under maintained
- For example, Setuptools has mostly 1.5-2 maintainers and the probability of Setuptools actually living in 5 years is lower than average due to the amount of tech debt
- I'm not sure open source has a good story about long term mantience
Toshio
- Building on what was said, I think we have several problems that are competing for space on this
  rather than to tools not to use
- For example, Setuptools has a limited lifespan, and we don't want users to learn it as their first tool
- Is Hatch going to be around in 5 years? We don't know, we don't have a crystal ball.
- One other thing is that some people need there to be a recommendation for one tool, so if we say don't use Setuptools but here are a bunch of other possible tools to replace it with
- Instead, if we say "If you don't like Setuptools, you should try hatch" this might help people get into the Python ecosystem
Leah
- A very common use case is users look at another package and just copy what they're doing
- We need to serve the scientific ecosystem, but we want to align with the broader ecosystem as well
Pradyun
- What I think I'm hearing is people want the PyPA to be more opinionated and have clear decisions and communications
- Personally, I'm wary of making any such recommendations that apply to everyone because it's not a monoculture like Travis said, there's not one way to do things
- Therefore, I'm wary of making that promise
Filipe
- Lots of overlap in each tool, but there are reasons one might use over another
- (So agreement with Pradyun that it's hard for PyPA to pick asingle recommendation)
- My recommendation is to use standard tooling when building stuff like build vs. using custom things like poetry build
- Then when you need to move the build backend it should be much easier
- Especially with PEP 621 (Pyproject metadata) makes that much easier
- To switch from setuptools to Flit if you have the project metadata in Pyproject.toml I think It's actually quite easy
- Can be more difficult depending on what you're doing (e.g. no native/binary stuff)
Brian Skinn
- More a packaging user than maintainer
- There's a difference in the mission of an association vs. an authority
- Many people want an authority and go to the PyPA looking for that, and don't find it
- If PyPA can't be an authority, we should make it clear that PyPA isn't actually one
- Alternatively, if the PyPA can be an authority, it should identify what would be needed to make that an actuality
- Trying to ride a middle course is more harmful than choosing one and going with it
Leah
- Just a quick note that there's a maintainer summit this afternoon

Python packaging on mobile platforms

Malcolm Smith

Works on Chaquopy, Anaconda, and Beeware
Python support plans on mobile
- Android and iOS have tentative approval to become Tier 3 supported platforms
- For a new Python platform to become viable, it needs to have a packaging ecosystem
- Need to get a positive feedback loop going between app developers and app users, as more of one results in more of the other [chicken or the egg problem]
Current situation
- Beeware has a build tool and package respositories for Python on mobile packages
- chaquopy repositories contain wheel builds of packages for mobile platforms, which often require minimal patching
  - https://chaquo.com/pypi-7.0
Problems
- Scalabiility
  - Only two maintainers, simply don't have the bandwidth to provide all the packages people ask for
  - Also a hassle to manually update existing packages
- Missing tool features
  - Cannot easily install non-Python build tools
  - Or download build time requirements
  - This requires manual setup making it difficult for others to use this
  - Realized that adding these features starts to make us the same as conda so…
Vision
- Would like to explore the idea of moving these builds to Conda-Forge (CF)
- Conda would build conda packages which we would convert to wheels for users to consume
- Would move to a more decentralized model of CF where each package has a specific maintainer that has knoelge of it vs. depending on a handful of common maintainers
- Also want to find a way for upstream developers to produce thse builds themselves, e.g. through cibuildwheel without having to become packaging experts
Discussion topics
- What would be nessesary to add mobile support on CF
- For PyPI, what would be required?
- Any thoughts about Conda->Wheel conversion
  - E.g. Conda-Press was a tool for that in the past; should we bring that back?
- Are there other strategies to address this overall problem?
Russell
- PEP 517 is great providing an abstract interface to build packages, but doesn't support cross-building
- This is a requirment on iOS and Android; cannot build packages on those platforms
- Not a standard way to define cross-platform targetting through PEP 517
- A lot of issues here are related to those WASM also hits since it also requires cross compilation
Filipe
- Cross plaform builds is something I'm interested in
- My plan is to get better support for cross builds in CPython, then propose a cross_build_wheel hook that supports a set of information about the targets you want to build
- Without support in CPython first I don't think anyone will adopt it
Malcolm
- Would really like to see better CPython support for cross building
Jannis
- Conda Press was not a good idea because it is a hack on a hack on a hack
- The Conda open source project is interested in solving this together
- There is another project that is also focused on this that also has similar problems
- Want to get the Python core team on board with this
Pradyun
- Can confirme we discussed this at the language summit
Henry
- One of the cibuildwheel maintainers (etc)
- We are currently adding WASM support to allow building a wheel you can use in pyiodide, etc.
- What is the distribution model here, is it wheels you put together and then distribute somehow, or would you be downloading wheels on the device itself?
- Malcom: The approach beeware takes is pre-building the packages, and then distibuting them
- With PyPI building wheels have requirements e.g. adding a tag that ensures your builds won't break in the future, but if you're building your own distribution you don't have these restrictions
- (Russell) Apple's App Store guidelines make this difficult in terms of being able to download code with pip
- (Pradyun and others) concur
- In terms of cibuildwheel we could defintely look to support this down the line
- In terms of cross compiling would certainly like some work on cross compiling standardization but is going to be difficult
Pradyun
- The cross compiling support needs to be built into CPython
- Coming back to the topics, do we have a clear answer on what needs to happen in CF?
Jannis
- The answer right now is no, we'd need to figure that out
- CF is seperately governed from Conda so we'd need to determine that with them
Henry
- There's also a WASM-forge for emscriptem so maybe we should coordinate with them given they face smilar problems
Pradyun
- Something that came up at the WASM summit is that Python does not have an application building story
- This is something that WASM needs, where users can throw something into a Docker container and ship it off
- The WASM summit was saying they don't want to own that story, but someone has to
- I think it needs to start from CPython up, and then propogate that information through build backends so this is going to be a long tail
Henry
- Cross compilation is becoming more important due to cross compiling ARM vs x86-64 on macOS
- The compilers can do it but we can't since there's no way to tell them that right now
Malcolm: In terms of application distribution that's one thing we're trying to do with the Beeware project.
Simon:
- Took weeks to figure this out, it was eventually possible but very difficult
- Deno, the JS package has had a feature for a while that has a feature that bundles node as a standalone thing
- Sure, means that there is big binary just to run hello world but that's inevitable

Overflow time

Sumana:
- My question-thought: As I was looking at your (Leah) diagram, do we need to just put something into the new-user flow that says "Ask for an expert's advice here"
- For health, right? We assume people are common case for most of their lives but hethere are times when you need to pay someone for expert advice
- Do we have a general sense when we need to just say no and tell people that the docs won't be able to solve it and users should go get expert adivce
- Like Python's PEP 11 tiers, bounding that and giving tiers of user support in terms of what the docs and general community will be able to help with
- My assumption is that people don't realize that they are edge cases
- When I think about this using the analogy of health problems, yeah there's a lot where we can give people self help tools but we should put out guardrails telling users when to "see a doctor" for harder packaging problems
- Or this is where you need to actually pay someone for expert advice and not depend ong volunteers; this seems to be a boundry we could set
Y
- My background isn't science, it is the car industry
- I see almost everyone are not Python experts, they are mostly using Anaconda because that's what's installed
- They just pip or conda install and don't really know why they are using one or the other
- We need more documentation on this
- 90% of all scientists maybe install Anaconda, and then they try to pip install something and they get a crash
Darhas
- With Quansight and I've been using Python since 2008, mostly on the Conda side
- One of the big problems I have gets back to the whole PyPA authority and how the website is designed, is when I'm going into a company and tell them what they should do, they say no, the PyPA says this
- The phrasing on the website is a big problem in terms of not being as clear that there are multiple communities and you may need to consult with them instead
Pradyun: What I'm hearing is that we need to improve the docs, help welcome
Sumana: Big problem is we need SEO to get rid of the billion old blog posts that recommend super obselete stuff like eggs. Could we pay someone to help get rid of those
Toshio: What if we have on the packaging.python.org docs one section that is non-opinioned, and one that is opinionated, and we make clear which is which
- We should curate things much more heavily on PyPA
- If we can provide a source for good quality "blog posts" we can help combat all those blogs
Leah: Our organization is supporting scientists, so we'd love to see more involvement there; We'd love to see more community interoperability in terms of being able to use different packages
- That's why we've preferred PDM as it allows different backends
- Maybe there could be a content vetting process that would allow PyPA to link, cross-reference or host content specific to specific disciplines
TP
- I see there are two competing forces, should PyPA become more of an opinionated authority or an accepting association
- Does anyone agree we should re-invent PyPA as an authority and create something above or below PyPA that is an association? (Or the other way around, rename PyPA and create an actual authority.)
Filipe: Don't forget about the second session, 1PM Sunday same room
Pradyun: The reason we're doing this now is we are having these conversations throughout the event.
Filipe: If anyone has any thoughts about sysconfig, come find me.

First session ends

Session 2: Technical Proposals

Dynamic Metadata Plugins proposal

Henry Schreiner

Henry's Presentation

Session goals and non-goals
- Judge how backend authors feel about this proposal
- Get input on API (and other) design
- Non-goal: Discussion about whether dynamic metadata is a good idea
Would result in a series of PEPs
PEP A: Partially static dynamic metadata
- Current: Metadata fields must be fully static or fully dynamic
- Proposed: Metadata field that takes arbitrary entries can be (only) extended dynamically
- Main thing that this would change is that tools would need to consider static metadata that may be extended if dynamic is set
- One main use case would be with pinning NumPy versions at build time as in with Meson-Python
PEP B: Dynamic metadata plugins:
- Most backends have some dynamic metadata now.
- This means the tools that build the wheels from source needs to be aware of the dynamic metadata
- Users may not have chosen a backend for a different reason than the choice of which metadata is dynamic
- Some existing plugin families serve these needs, e.g. Setuptools-SCM, Hatch-VCS, etc.; would help if they could share this work
- Other plugin use cases
  - Fancy README
  - Generating build requirements (for instance, when using NumPy headers, the build needs to depend on the specific version)
API:
- dynamic_metadata(fields, settings) -> dict of metadata fields
- get_requires_for_dynamic_metadata()
- get_requires_for_dynamic_metadata would be the backend-side hook to install any nessesary plugins at build time
PEP C: Dynamic Metadata Plugins in pyproject.toml
- Currently each plugin has to come up with its own config/communication mechanism
- This would provide a standardized mechanism to invoke and configure plugins on the user-side along with declaring the relevant fields as dynamic

Discussion

Brett:
- Have we thought about listing an "incomplete" value instead of dynamic to specify that metadata fields may be extended?
- Would improve backwards compat and be self-documenting
Henry:
- This would complicate the pyproject.toml config as we'd have both dynamic and incomplete and both could be tables or lists
- The repercussions may not be that bad
Brett: This would affect backward compat as tools would flag metadata being both dynamic and specified
Henry: Well, backends would need to be updated anyway to support this, so what other tools are validating this
Brett: Right, though I've learned not to assume what other tools might be validating this
?
- Dynamic only makes sense for list fields?
Henry
- All arbitrary lists and tables
Pradyun
- For PEP A, design is fine; less concerned about backward compat
- PEP B & C: we could throw this in as something that backends could use instead.
Henry
- So for B, we could potentially negotiate this between backends without a PEP
- However, standardizing this would be a requirement for PEP C, adding the config in pyproject.toml
- This make this more complicated for tools in the long term as they need to check multiple places
Pradyun
- I'm not sure we really need PEP C as if the backends all share the same implementation, backends could all re-use the same [tool.dynamic_metadata] section without the need for a PEP
- The meat of the thing is to get all the backends on board, so if we can do that, we can just have them agree on a [tool.*] section
Henry
- That might be a better idea is to build that tool and then propose a PEP later, didn't fully think about that
Filipe
- Is it that common that you have to add extra dependencies rather than just constraining specific ones
- In my experience with Meson-Python the issue is constraining existing dependencies for the wheel as it can only work with specific versions of NumPy that its built with
- Maybe instead introduce a concept of an API for packages specifically tailored toward this use case
  - E.g. NumPy declares an API tag that communicates its compatibility for built wheels
- Are there other large use cases that would make sense to add the complexity of this proposal?
Henry
- The main one for dependencies is the one you mentioned, but there are others for other fields such as classifiers
- If you have a tool that adds classifiers, then you have to move all your existing ones to dynamic
- Entrypoints is another potential big thing that people asked for in scikit-build-core
TP: It probably makes sense if you can get the backend authors together we could come up with something without going through the PEP proces to avoid overwhealming it
Henry: I'll try to work on PEP A and then B and C can probably be done through a library for now
Bernat: If you do need additional dependencies, wouldn't that need to be done through the fronedend?
Henry:
- Yes, the frontend asks the backend for the dependencies it needs to install
- Really useful for wrappers and similar
- Sensitive to how things work to avoid injecting deps like CMake everywhere

Namespaces on PyPI

Dustin Ingram

Slides: https://github.com/di/talks/blob/master/2023/pycon_us_packaging_summit_2023/talk.pdf

One of the PyPI maintainers
Why do we need namespaces?
- quarter of a million projects
- A lot of old stuff and a lot of contention for project names
Problem: PEP 541 (how to remove old pypi projects)
- Incredibly onerous and boring for the PyPI admins
- Long backlog of requests as it requires extremely high trust
Problem: Typosquatting is another problem
- Creating project names on PyPI with similar names to prominent packages
- Have a current policy that prevents this but also catches up legitimite projects too
Problem: Dependency confusion
- A lot of different approachs to dep confusion
- One is to just create a namespace that only you can publish to
Problem: We now have organizations
- Just officially announced this morning
- Allows orgs to more easily publish to PyPI
- Namespaces are the second most requested feature after this
Idea: GitHub style namespaces
- Has both user and org namespaces without collisions
- pypi/repo and di/repo don't collide
Idea: Look at what npm did; they implemented namespaces, see how it worked for them
Idea: Should not have breaking changes
- All installers should continue to work even if things are inside a namespace

Discussion

Sumana: used to be PyPI project manager
- Shamika (current PM) has done work on user experiance on PyPI
  - Has she provided any guidance on how namespaces should work?
Dustin
- Her work has focused on organizations and not namespaces
Sumana
- Back in 2020 when we did the pip overhaul we did tons of UX research to ensure that the changes follow user expectations and mental models
  - https://pip.pypa.io/en/stable/ux_research_design/
- The better we understand user mental models the better things will go
Dustin
- I don't expect this to be a quick PEP, this will be a longer project that will require funding and part of that will be for UX research
Phebe
- As a contributor to pytest/maintainer of pytest plugins, think this is a great idea since pytest plugins are like a microcosm of the larger problem for pypi.
- How would namespaces work if you were maintaining your own package index/mirror?
Dustin
- I'm not totally sure how this would work yet with that but we would need to understand that first
Phebe
- As a Pytest plugin mainatiner would want to know how I could publish my plugins under their namespace
Dwight
- It should be okay for a namespace to be empty if it is placed on a public repository to avoid typosquatting
Bernat
- Once a namespace is added, can we take ownership of other packages which are currently not in the namespace
- On the other side, as a corperation I would want to own everything under that namespace
- But as a FOSS maintainer I would like to own certain packages under my namespace but allow others to publish other packages under the namespace
- Would be nice to have a way to approve or not other packages under that namespace
- Would be good to have a way to indicate whether an namespace takes ownership of something under their namespace
Dustin: I think that should be up to the repo owner and we should follow GitHub's model here
Peter:
- Weak coupling to all the other packages that exist but have the name of the organization in it
- Hits the big tech vendors quite a bit
- Now you've got a Nvidia or AWS org and then all those packages that are not affliated with that
- Organizations are going to have the expectation that they own their namespaces
- What happens for projects which have no relation to the owner of the namespace but which adds onto the project in the namespace?
- So many orgs have their own internal mirrors and infra and want the final say on the resolution order there
- This converges on something like PKI or DNS
- When orgs run GitHub enterprise they run a proxy system to shim access to outside GitHub
- Do we need to look at federation systems like DNS?
- When we look at a future with package signining and supply chain security we need to look at going in that direction.
Dustin
- Its important if we go down this route, we need to make a clear distinction between namespaces and just parts of the project names
- Some package names have a prefix that indicates an organization, and some don't, so we need to simply that
- This leads to a need for a new syntax for that
Pradyun: One quick comment: Mentioned that GitHub namespaces allow overlap between users and org names
Dustin: This was referring to repo names under those names rather than an org and a user name being the same
TP: How many people are familiar with how NPM did this? (around 5 hands raised)
?: How do you verify namespace names against the names of organizations
- Dustin: Will need to figure that out
Toshio:
- I think this hits on a problem with Python
  - How do we de-conflict import names, PyPI names and names in other packaging systems (e.g. Fedora)
Dustin: That's defintely a problem but out of scope here since its a problem that already exists
Toshio: I think it will get worse with namespaces as its easy to search PyPI right now to see what names are already taken, and namespaces would make that harder
Itamar
- As a big tech user I would like namespaces to be attached to an org
- Would like aliases so we could typosquat ourselves, essentially
Dustin: Would allowing registering multiple empty namespaces avoid this issue?
Itamar
- It would from a security perspective but not a usability perspective
- Itamar thinks that users would expect to get the same package installed if they used the alias name or the canonical name
Dustin: Yeah, I think there would need to be somethin done with aliases
?:
- What is the plan for big companies who have a lot of traffic on PyPI to use to finance the user of the community infrastructure
Dustin
- So I think the assumption in this question is that heavy user is a problem right now
- It can be a problem but not something that keeps me up at night
- There are some users which can potentially be problematic and we would need to work with them about that
- For example, all the GPU binaries [Tensorflow, etc.] but I think that will get better over time
- We already protect against overconsumption with the limits we have
Sumana
- It seems there are different ways we could organize this that could change the level of cost this would require
- Therefore would need some idea of how much revenue is availible
- We could determine the best way of doing it and then find a way to fund it
- Alternatively, we could think about the best approach given a certain availible budget which might be slighly less effective but could be much less expensive
Dustin
- Yes, we are currently looking to hire someone on the PyPI side who will work on security

New sysconfig API

Filipe Lains

Evil plan to get better cross compilation support in CPython (clapping)
Issues with current sysconfig
- Introduced in Python 3.2, long time ago (10 years ago)
- Hasn't gotten any major API changes over that time, just additions that we needed
- Mostly been developed ad-hoc rather than through making deliberate and throughtful design decisions
- Lot of the information is via get_config_vars() which just exports the variables from the Makefile (many are not relevant)
- We are in this wierd place where most core devs don't really know that if they change variable names in the Makefile, it will break lots of stuff
This is a good escape hatch mechanism as you can get all kinds of information with it but for the very important info (compilation, packaging info, etc) we should have a proper API for it
- Installation paths and locations is an old style of solving this problem. Newer ways of solving this will solve more problems in a more elegant way
  - For example, recently had a proposal to add __pypackages__ to install local Python packages (NPM style)
  - That proposal doesn't really make sense to have a scripts directory as things won't work as expected if you add that to the path
  - How would we expose something like this in sysconfig?
Another issue that some paths are shared between all installations
- PLATINCLUDE and INCLUDE paths are the same, where the python interpreter is installed.
- When you create a new venv it creates a layout with purelib and platlib but if you need to install to include paths it will be availible globally
  - E.g. NumPy installs headers which end up being global
  - If two pacakges install headers with different versions they will conflict
The model for the paths and install locations is not really that great and the way things have evolved has changed from when sysconfig was introduced
- Back when venvs were introduced would copy everything including stdlib
- When we moved toward smaller venvs then it creates the above issues that were overlooked and nobody fixed
Cross comiplation support right now is very poor
- When compiling a package, information is in a lot of different places
  - sysconfig
  - platform
  - sys
  - Others
- Adding support for cross compilation requires a lot of changes in your code due to how much this information is spread out
Plan to tackle that is make all the information required for this available in the sysconfig and object-based
- Would be a single function that would return an object containing all the required information
- Would then make supporting cross-compilation in your code as simple as replacing that object with one containing the target details
- Would then allow creating a PEP 517 style hook to provide the information, which the user could either provide for the target platform or else get from the sysconfig module of the current Python
- Sysconfig API itself could use a data file to get the current information which would allow statically determining the details of the current interpreter, like they do in Rust
Overall, wanted to get some feedback on sysconfig and anything to do with compilation to ensure what we are doing is compatible with your users and is an improvement for you
Have a couple issues open for these different aspects
- Meta issue tracking the information we want to expose https://github.com/python/cpython/issues/103480
- Very similar to a thread Brett opened regarding the Python Launcher for Unix: https://discuss.python.org/t/what-information-is-useful-to-know-statically-about-an-interpreter/25563
Could allow building Windows wheels on Linux and vice versa

Discussion

Matthias
- First, sysconfig contains a lot of useless information on how the interpreter itself was build, so for a new API we should not include this
- Second, like the idea of having everything in sysconfig but unfortunately its not as we also have platform module which duplicates it
- Third, easy to just replace sysconfig data from the target build platform
- There is already some prior art in a cross-compilation PEP.
- Cross-compilation should coordinate with people who have already worked on it (Matthias from Ubuntu, other Linux distros/Posix systems)
- So I suggest that there's a cross compilation PEP prior to comitting to the new API so it can serve those needs
Filipe
- I don't think just replacing the sysconfig data is enough to make cross compilation work, you need to patch a lot of other stuff in Python itself
- People building stuff that targets differnet platform will look at that stuff in sys and lots of other places
- Not sure what actual things we can do in a PEP for cross-compilation, do you come from a build backend side or CPython?
  - If you come with CPython what other things can do you than just compile the info that cross-compilation people might need
  - If you come from the build backend side…
Pradyun
- I think a PEP is useful as it provides coordination among the parties involved
- Even just having something clearly written down and agreed upon is useful
Filipe
- I definately agree with having a PEP to write this down
Russell
- Is this still at the phase of looking at how this data is used or have you started prototyping?
Filipe
- Yes, I've started prototyping but want to gather more information on how sysconfig info is used for cross-compilation
Pradyun
- How disruptive do you expect this to be, how many people to you expect to have to make changes and who do you expect this to be?
Filipe
- I don't expect this to be too disruptive as the changes will happen as people adopt the new API and the old API will stay around for a while
- Advantage of adopting the new API will allow cross-compilation to work almost be for free
Pradyun
- Disadvantage seems to be that we would have duplicate information in several places
Filipe
- Yes, that's the main worry as this would lead to some duplicate information
- There are already places that have the same information but in a wierd way
- So having one place where you can get this information in a more concise way is good
- Defintely the main drawback of this API but considering the improvements that this would bring is worth it

Distributing CLI tools without venvs

Kshitij Aranke

Slides: https://www.aranke.org/slides/2023_pyconus.pdf

Going to be a very different talk than the others as it won't be very technical and will be rather about our experience
Software engineer at dbt labs, dbt-core maintainer and have been using Python for 10 years since college
Am an outsider to the python packaging community
- Don't know much
- Packaging is a (sometimes painful) job to be done
- Want to work with others to find solutions to this
Not really rewarded for successful packaging; it's just to get my real work/project distributable via pip.
- So want packaging to be uneventful and not need to go to Packaging summit, so I can just get my job done
What is dbt?
- OSS CLI tool
- Distributed via PyPI
- For non-Python users
Growing pains
- Explaining what a venv is to a beginner
- Version resolution conflicts with other packages very common
- Lack of information and curation of it (e.g., on packaging.python.org)
Don't have solutions myself, just listing the problems and asking you for me
- Think the namespaces thing from earlier is pretty cool and might solve some of our problems
dbt <3 Python
- Well understood in the data community
- Allows our users to contribute back because some of them do know Python
- Great libraries to build on top of
Let's work on the pain points together
First one Virtualenvs
- Break flow state
  - Forgot to activate venv; how to fix?
  - Workflow unfamiliar to npm users
  - How to have multiple venvs
- You can use X from PyPI
  - Can I trust the developers of this tool?
  - Why isn't this recommended by python.org/some authority?
  - How can I continue to trust this tool into the future?
Version resolution conflicts
- 3 million downloads per month and many required dependencies
  - Don't know which of those are C; it should just work
  - dbt (i.e.: package developer) thinks they shouldn't have to think about which dependencies are pure python or compiled
- Users complain that dbt locks things down to the patch version, which makes it extremely difficult to use in an env with any other tools in it
- Users try out new dep versions as part of trying out a minor new version [of dbt]
  - Clear channel for feedback
- Very hard to install alongside other packages
- So far, have accepted these tradeoffs
- For example, one dep released a patch version that broke dbt and still don't fully understand but just solved with hard pin for now
Distribution solution
- We need a distribution solution that is
  - Deterministic (for consistent results)
  - Well understood (users shouldn't have to be a packaging expert)
  - Easy to deploy (dbt runs in many places)
- Docker install is the only solution we found that works reliably and have pushed people toward it instead of using pip/PyPI
Lack of information curation
- No recommendations for which tools should be used.
- Example: Chris Warrick article was helpful on this
- Difficult to keep up with best practices/recommendations/etc. if you're not an insider.
  - E.g., packaging strategy discussion part 1 was 200 posts long and reading it is not my day job
- What does UX look like in 2023: there is a UX study in the pip documentation, but there's no roadmap. What is the status of this.
- Where is the current, up-to-date canonical source of information? I don't know.
Summary: Going to hope for the best and plan for the worst
- Going to use either pdb or Poetry and Docker

Discussion

Brett
- What would you need from pipx to be able to recommend it?
- Have you tried to get your tool into the package managers like Conda and distros
Kshitij
- pipx: dbt depends on having some of its plugins (separate packages) to run.
  - Would have to inject them into the pipx state
  - Not shipped by default so kind of have the same trust problem
- Get into package managers
  - Would have to ship dbt-core + plugins each combo as separate packages
Brett
- Just verifying—what do you tell users, install dbt-core then install the plugins, use extras, install them all at once, etc?
Kshitij
- We do ask that user install them separately
TP
- DBT and airflow together is a problem, but it is airflow's problem.
- astronomer: Only solution is to not install them into the same venv
Kshitij: Also what people said on Twitter as well
Henry
- One thing you can do with pipx is you can tell it to just install that one command for each plugin
- Is a backend vs. frontend thing there, the backend doesn't affect any of this
- If you're looking at their docs it would be for Hatch, Poetry, etc. that would be a frontend thing
- If there were some way for a package to say here are the pinned deps and here are the library deps that would be helpful (pipx could install one and pip could install the other)
- In terms of the 80%, I think maybe you wouldn't be in the 80% based on what you've said, so you have to consider the people that wouldn't end up in the 80%
Peter
- dbt is a really big deal in the data ecosystem
- So your pain points are important.
- If there isn't another solution given by upstream, my solution is to ship a machine image (container, vm image, etc.)
- 10 years ago, the solution for Juptyer was to literally ship a machine image (Docker) and tell Windows users to use a Linux machine image
- Several personas/use cases to try to get to 80%
  - Could be installing Python package because you've been extending a tool that's been given to you; need pip dependencies so that things will be coherant and then make your own package on top of that
  - Others who use dbt as a SDK and build other things on top of that
  - Snowflow literally paid Anaconda to build their own cloud/distribution to ensure that everything would work
- Want great UX for both devs and users but need to identify the different use cases first

Overflow time

Pradyun
- Thanks for the notes! They're important because not everyone can come and having notes will keep everyone informed
CAM
- 10 second plugs for talk to me after
  - PackagingCon
  - Come to the sprints to help use improve the packaging docs!
  - Kshitij: Talk to me after

2023/04/21 17:35:19

Kshitij Aranke (Edited)

Daniel Holth

2023/04/21 17:37:55

Reframing PyPA as the Python Packaging Association

I strongly dislike the Authority name. Personally I find the Authority name to be alienating and a discouragement for new ideas. (Edited)

2023/04/21 17:59:57

Chaquopy, Anaconda, and Beeware

is a setoftools (Edited)