# Eat 'n Learn
## Test Driven Development for Natural Language Processing
---
## Presentation structure
1. How does ML/NLP development normally look like?
2. Problems we faced.
3. How we use TDD to solve (some of) the problems :person_doing_cartwheel:
4. Worked example.
---
## Parts of an ML/NLP System
1. Preprocess the dataset (inputs and labels).
2. Initialize the model.
3. Split the dataset into batches.
4. For each batch:
a) Make predictions.
b) Learn based from the difference between predicted and actual labels.
5. Evaluate on unseen data and report results.
---
## The Problems
1. Highly non-deterministic behaviour.
2. Unavoidable complexity.
3. Unpredictable and unexplainable results (even with deterministic components).
---
## What we get out of TDD?
_This should sound familiar_
1. Break up the system into managable components.
2. Weak correctness guarantees.
3. **Interactive development environment.**
---
## _Problem:_ Training the model.
We'd like to make sure that:
1. Code runs without errors :shrug:
2. All model weights change based on training data.
---
## Why is it not trivial?

---
## The Tests
```python
import numpy as np
from .fixtures import torch_trainer
def test_torch_trainer_changes_model_params(torch_trainer):
params1 = [p.copy() for p in\
torch_trainer.model.parameters()]
torch_trainer.train()
params2 = [p.copy() for p in\
torch_trainer.model.parameters()]
assert not any([np.all(np.equal(p1, p2))\
for p1, p2 in zip(params1, params2)]),\
"Parameters didn't change after training"
```
---
## The Code
```python
import numpy as np
def train(self):
self.model.train()
for batch in self.train_dataloader:
self.optimizer.zero_grad()
X, y_true = batch
y_pred = self.model(X)
loss = self.criterion(y_pred, y_true)
loss.backward()
self.optimizer.step()
```
---
## Further Learning
1. For more applied AI: fast.ai
2. For more ML/NLP engineering: pair up with one of the data science peeps.
---
## Thank you for you curiousity.
{"metaMigratedAt":"2023-06-16T03:07:40.271Z","metaMigratedFrom":"YAML","title":"Test Driven Development for Natural Language Processing","breaks":true,"description":"Armin and Mou describe how they use TDD to solve common engineering problems faced by data scientists.","contributors":"[{\"id\":\"39d92a74-a578-44a6-a68c-dcb0d15b7c95\",\"add\":6157,\"del\":3785}]"}