Scientific Python Packaging Summit 2023

Opening questions

Raise hands by experience level with packaging: beginner, intermediate, advanced, expert
Raise hand if you've worked with compiled packages
Raise hand if regularly use pip /…or conda
Raise hand if you find packaging one of the biggest pain points with Python

Schedule

6:30 - Welcome and very brief intro to the topics and progress in the space and take temperature of room w/ opening questions
6:35 - Providing better guidance and documentation to users (plug Scikit-HEP guide & reporeview)
6:50 - Share experiences and ideas for pip and conda interoperability
7:05 - Discuss experiences with, improvements to and questions regarding compiled build backends (scikit-build, meson-python, etc.)
7:20 - Free time for additional questions, discussion topics and short plugs
7:30 - Session ends

Welcome and brief intro to the topics and progress in the space and take temperature of the room

Raised hands by experiance level:
- Expert: 1 person (Henry)
- Advanced: About 5-ish hands
- Intermediate: Half dozen hands
- Beginner: Dozen hands
- Around half beginner, most of the rest intermediate and a few experts
Do you work with compiled packages
- Working with compiled packages: About half the room
Raise hand if you regularly use pip / conda:
- Almost everyone for both
Do you find packaging one of the biggest pain points with Python:
- Beginner almost all and half of the intermediates

Providing better guidance and documentation to users

Henry: This is in response to a community question. Has anyone heard of the Scikit-HEP developer guide? (few hands)
- This was recently merged in to the Scientific Python organization's packaging guide
Guides on using various tools, like Pre-Commit, Pytest, etc.
- Gives a guide to modern, standard packaging with various backends
- Page on using and creating compiled Python packages
- Suggestions for lots of tools (linters, type checkers, formatters and more) and how to set them up
- More advanced things like setting up GitHub pages, GitHub Actions, etc
- Also using task runners (tox/nox), pre-commit, etc
Two other pieces of the guide too that are also important
- A cookiecutter template that supports both copier and cookiecutter as well as 12 different build backends
- Additionally, the repo review tool with a set of scientific Python specifics checks; can enter your repo and it will run a bunch of useful checks against your repo, with links to help info for each if it fails
Anyone else have tooling that complements this, or can be merged in to it?
- Is the cookiecutter project already generating automatic GitHub Actions to build your project?
  - Henry: Yes, it will generate both GitHub Actions and GitLab CI workflows to build your package and publish its artifacts
- What are the default jobs in the cookiecutter?
  - Pre-commit, build project on different Python versions, CD job to build and deploy release artifacts
- Does the guide cover checking, generating and managing release notes with different options for that?
  - Would probably fit under the topical section, going over the main options and recommending one
- You said there were 12 build backends…could we pare that down a bit?
  - Yeah, it could probably be pared down a bit, though it does clearly recommend different options for different types of packages
- Does it also include conda-forge generation?
  - Not in the cookie, since its in a seperate repo, but this can be mentioned in the guide and can be auto-generated using greyskull
Do you include tests in your package, or in a seperate folder?
- Henry: Fan of having a few tests in the package to test the binary, and a seperate test directory for the full test suite

There's a plugin hatch-conda to integrate with Conda environments within pyproject.toml for task runner functionality
- Also the aforementioned greyskull plugin to convert PyPI metadata to Conda-Forge
Mike Sarahan (conda dev): Want to give you guys some historical context
- Conda has a very different scope of metadata, as it captures binary dependencies unlike pip
- Those are two totally different scopes of problems
- There were originally some proposals to allow that with PyPA metadata, but didn't really go anywhere at the time
- If the PyPA maintainers aren't interested in addressing that problem, then I don't blame them
Henry: One example issue was raised in a talk yesterday regarding OpenMP in wheels, where it was almost impossible to do on PyPI but was easy with Conda
- Mike: Yeah, pretty much all the pip maintainers will tell you don't try to shoehorn shared libraries into PyPI as its not a good solution
- (Talk presenter) We don't want to do this, and its not easy, but we don't currently have a better solution, so we have to do it but would really like to have a better solution
- (Mike) You have to assume metadata that isn't there, and just hope it works
- (Talk presenter) Right, because that's all we can do
- (Mike) You effectively need some way to coordinate among all dependencies to use the same dep versions, which is what Conda does. If you look back at the hisory, Travis asked Guido how you handle binary deps with the scientific Python ecosystem and he said "do it yourself", which is what happened with Conda (https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/)
(Henry) Right, though things have changed a lot since then, with the PyPA much more interested in solving this issue and tools and the ecosystem being in a very different place than even 5 years ago
- (Mike) Good luck :)
- (Leah) This would mean basically redoing what Conda did 10 years ago, which raises some concerns as to whether it is nesssesary
- Is it worse to try to tackle these problems and make PyPI try to go from a 98% to 99.5% solution, which might just be worse?
- (Ryan) Be careful when you say "PyPI says", since that's a rather nebulous thing and its mostly a question of metadata standards
(Brian) As to the question of mixing conda and pip, the idea is you shouldn't mix them, or at most install the conda stuff first and then the pip stuff following a careful flow, and if messes up delete everything and start over. It's never going to be perfect, but are there points of that brittleness that can be eased?
Why is anyone still using pip?
- Because it's convenient, common, fast, and easy to use for most things
- All the people I hear using it is web people
  - Well, there are a lot of those people
- You're still going to need it to actually install your own package, and Conda uses it too
Biggest problem to overcome is the bootstrapping problem on the user side
- (Leah) Mambaforge can make this easy
  - This greatly helped reduce problems for students getting started, especially GIS that I work in
- Is there more work to be done to make this easier?
- Could we actually ship mamba with Python?
  - It would have to be the other way around, and we have micromamba for that already

Additional questions, discussion topics and short plugs

Talk to me after about application packaging! (Which is a whole other compelx ball game…)
(Mike) One last thought: A lot of this stuff is a non-starter because upstream Python has to deal with distros and re-distributors
- Furthermore, PyPI's self-goverened model is very different from that of Conda's, which can work better for smaller projects
Libmamba will be merged into Conda by default in September
Sprints if you're interested in packaging:
- Scikit-build-core will hosting a sprint with Henry
- Conda-Store will be having a sprint (platform to share, sync and maintain conda environments)
- Conda sprint is planned
- Scientific Python will have a sprint on their guides, including packaging guide
- Mike will be availible if people are interested

Opening questions

Schedule

Welcome and brief intro to the topics and progress in the space and take temperature of the room

Providing better guidance and documentation to users

Share experiences and ideas for pip and conda interoperability

Additional questions, discussion topics and short plugs

Read more

Beyond Notebooks: From reproducible to reusable research