URSSI Summer School

# URSSI Summer School **Agenda and details:** https://github.com/si2-urssi/summerschool-July2024 Zulip space: https://urssi-softwareschool.zulipchat.com ## Research Software Practices ### Software Design Core ideas: * Software should be **decomposable** , broken down into modules that you can reuse * **composable** -- each piece of code should be able to be constructed with other modules and work together to compose cohesive functionality * **understandable** -- each module should be understandable on its own. Its functionality should be readable by a human separate from the other modules it might work with * **continuity** -- a small change should affect a small number of modules * **isolation** -- an error in one module should be as contained as possible, so that when you do get an error you don't have to sift through a confusing stack trace of issues Generally variable names should be descriptive. The variable name being **self-documenting** can enhance readability. In legacy codes where variable names take up a lot of RAM choosing a shorter name is valid, but in modern languages this is not an issue so try to avoid variable names like `A` or something ambiguous like `foo`. **Working with legacy codebases**: if you have a big codebase you inherit, try to start with a roadmap before you do any changes. Then you can start figuring out natural ways to decompose the code. This also forces you to make a mental model of the software. **Future you** is one of your teammates. If you make things easy for you to read, future you will appreciate your past collaborative self. When you run into an issue, try to create a minimal working example of code. It helps you and other developers to see how the code is being used without the burden of other functionality. This can also help you recognize if you're causing the problem, and if so, how. Tip: you can use a LLM to give you recommendations on how to refactor code, but be careful that the code is something that can be public, and use your knowledge of the existing code to make sure that it's properly referencing real APIs. ### Structuring Python Packages, Packaging Clone matthew's repository at: `git@github.com:matthewfeickert-talks/talk-urssi-summer-school-2024.git` Then execute `$ conda env create --yes --file environment.yml` then activate the environment with `$ conda activate urssi-summer-school-2024-packaging` check that you are in the environment `$ command -v Python` #### option 1: Inject the path of your code into your sys.path and use it to access what you need. This works, but it's a band-aid on a problem. Your path can get very long. This ties your code to a relative path on your local computer. As soon as you want to move elsewhere on your computer it will break. #### option 2: Try to make your code installable. * go to `repo/examples/simple_packaging`. Look at the directory. use `tree` to look at the directory structure * `src` is the directory that contains all of our modules * `tests` or `test` is the directory that contains the files that you use to test modules that exist in `src` * `.toml` is a file written in "tom's obvious minimal language" that aims to be a minimal configuration file format. **pyproject.toml** is the configuration file that you will use to build your package * `pyproject.toml` * `[build-system]` `requires` defines the required tools that need to be installed to be able package a piece of softare, `build-backend` takes the information and tries to build a package with it. * `[project]` contains project metadata that will be associated with the project, including the author and the versioning. The package index website will display this information if people navigate to the package. * should we put maximum versions in the `requires-python`? Matthew: There is almost never a good reason to do this. You are dooming users to have breaking code. If users download the code in the future and the maximum version is deprecated they can't use that code. * For libraries, put lower bounds, but not upper bounds. If you want an exact version -- put a lockfile. * the `classifiers` metadata piece is human readable and will show users what versions of python you have tested against without limiting users what they can install. * **README** is the first human-facing document that your package will show people. Try to explain what your package does, what it doesn't, how to use it, and direct people to use cases and examples. "dunder" is slang for "double underscore". So the "dunder init" file is the `__init__` file. `python -m pip install .` installs the package that exists in the current working directory, noted by `.`. `python -m pip show rosen` tells us the location of our installed package. This shoudl be in our virtual environment. `python -m` is used as a prefix here to make sure that whatever version of `pip` you're installing with is the `pip` that your python version is using (the python that is in your environment). This is guarding us from repeating matthew's past follies. Ok, now we have a working package! We can also build an "editable install", using the following command: `python -m pip install --upgrade --editable .` . This is VERY useful for development if you want to use your package while you're actively changing it. Why build wheels? these are pre built versions of our code for specific arcitectures. This allows other people (on different machines) to install your package without having to build it themselves. Wheels are hosted on the internet on package indexes. Probably the most ubiquitous package index is PyPI (the Python Packaging Index). If you upload wheels here people can `pip install <yourcode>` and pip will automatically find it on the packaging index. Your users will be able to install code easily. ### Collaborating with git + github ### Testing and Linting ### Documenting and Versioning ## Open Science Practices ### Ethos ### Open Science Tools and Resources ### Open Data ### Open Results