Data science is no longer solo work, but teamwork. The corollary is this: discipline enables better collaborative work.
How do we enable these?
from custom_source import load_data
df = load_data(commit="5j39fdm")
💯% reproducible.
Avoid this conversation:
"Which version of that function were you talking about? The one in
Untitled12.ipynb
?"
"No, the one in
Untitled13.ipynb
!"
Inside custom_source/functions.py
:
def that_function():
return stuff
Inside Untitled12.ipynb
:
from custom_source.functions import that_function
Inside Untitled13.ipynb
:
from custom_source.functions import that_function
If you modify a function that others depend on, it should still work for the existing use cases.
import pytest
from custom_source.functions import that_function
def test_that_function():
result = that_function()
assert result == something_correct
If you create, modify, and return dataframes, make sure that they follow expectations.
import pandera as pa
from custom_source.schemas import this_schema
@pa.check_output(this_schema)
def load_data(commit):
...
return df
But the code works on my system?!
But if it doesn't work on someone else's system…?
😫 Problem: most projects have a complex set of dependencies that aren't covered by one tool (e.g. pip
).
😇 Solution: Explicitly specify all dependencies via configuation files:
environment.yml
or requirements.txt
for project dependenciesDockerfile
for system-level dependencies.Containers let you ship dependency stack explicitly.
# Explicit version number!
# Standardize on some base image.
FROM condaforge/mambaforge:4.12.0-0
# Signal to next person that the project needs a `conda` environment.
COPY environment.yml /tmp/environment.yml
# Never deal with custom environment names in a Docker container.
# Always install to base.
RUN mamba env update -f /tmp/environment.yml -n base
# add additional steps below.
Work on the cloud. Create/destroy instances at will. Learn how to recreate environments. That will force portability.
Good project documentation enables others to quickly gain context.
Your future self will thank you.
Image credit: diataxis.fr
These will enable you to collaborate effectively and ship your work productively.