slide: https://hackmd.io/@ericmjl/software-ds
Data science is no longer solo work, but teamwork. The corollary is this: discipline enables better collaborative work.
How do we enable these?
from custom_source import load_data
df = load_data(commit="5j39fdm")
💯% reproducible.
Avoid this conversation:
"Which version of that function were you talking about? The one in
Untitled12.ipynb
?"
"No, the one in
Untitled13.ipynb
!"
Inside custom_source/functions.py
:
def that_function():
return stuff
Inside Untitled12.ipynb
:
from custom_source.functions import that_function
Inside Untitled13.ipynb
:
from custom_source.functions import that_function
If you modify a function that others depend on, it should still work for the existing use cases.
import pytest
from custom_source.functions import that_function
def test_that_function():
result = that_function()
assert result == something_correct
If you create, modify, and return dataframes, make sure that they follow expectations.
import pandera as pa
from custom_source.schemas import this_schema
@pa.check_output(this_schema)
def load_data(commit):
...
return df
But the code works on my system?!
But if it doesn't work on someone else's system…?
😫 Problem: most projects have a complex set of dependencies that aren't covered by one tool (e.g. pip
).
😇 Solution: Explicitly specify all dependencies via configuation files:
environment.yml
or requirements.txt
for project dependenciesDockerfile
for system-level dependencies.Containers let you ship dependency stack explicitly.
# Explicit version number!
# Standardize on some base image.
FROM condaforge/mambaforge:4.12.0-0
# Signal to next person that the project needs a `conda` environment.
COPY environment.yml /tmp/environment.yml
# Never deal with custom environment names in a Docker container.
# Always install to base.
RUN mamba env update -f /tmp/environment.yml -n base
# add additional steps below.
Work on the cloud. Create/destroy instances at will. Learn how to recreate environments. That will force portability.
Good project documentation enables others to quickly gain context.
Your future self will thank you.
Image credit: diataxis.fr
These will enable you to collaborate effectively and ship your work productively.
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing