# Eat 'n Learn ## Test Driven Development for Natural Language Processing --- ## Presentation structure 1. How does ML/NLP development normally look like? 2. Problems we faced. 3. How we use TDD to solve (some of) the problems :person_doing_cartwheel: 4. Worked example. --- ## Parts of an ML/NLP System 1. Preprocess the dataset (inputs and labels). 2. Initialize the model. 3. Split the dataset into batches. 4. For each batch: a) Make predictions. b) Learn based from the difference between predicted and actual labels. 5. Evaluate on unseen data and report results. --- ## The Problems 1. Highly non-deterministic behaviour. 2. Unavoidable complexity. 3. Unpredictable and unexplainable results (even with deterministic components). --- ## What we get out of TDD? _This should sound familiar_ 1. Break up the system into managable components. 2. Weak correctness guarantees. 3. **Interactive development environment.** --- ## _Problem:_ Training the model. We'd like to make sure that: 1. Code runs without errors :shrug: 2. All model weights change based on training data. --- ## Why is it not trivial? ![](https://i.imgur.com/w2QWHdV.png) --- ## The Tests ```python import numpy as np from .fixtures import torch_trainer def test_torch_trainer_changes_model_params(torch_trainer): params1 = [p.copy() for p in\ torch_trainer.model.parameters()] torch_trainer.train() params2 = [p.copy() for p in\ torch_trainer.model.parameters()] assert not any([np.all(np.equal(p1, p2))\ for p1, p2 in zip(params1, params2)]),\ "Parameters didn't change after training" ``` --- ## The Code ```python import numpy as np def train(self): self.model.train() for batch in self.train_dataloader: self.optimizer.zero_grad() X, y_true = batch y_pred = self.model(X) loss = self.criterion(y_pred, y_true) loss.backward() self.optimizer.step() ``` --- ## Further Learning 1. For more applied AI: fast.ai 2. For more ML/NLP engineering: pair up with one of the data science peeps. --- ## Thank you for you curiousity.
{"metaMigratedAt":"2023-06-16T03:07:40.271Z","metaMigratedFrom":"YAML","title":"Test Driven Development for Natural Language Processing","breaks":true,"description":"Armin and Mou describe how they use TDD to solve common engineering problems faced by data scientists.","contributors":"[{\"id\":\"39d92a74-a578-44a6-a68c-dcb0d15b7c95\",\"add\":6157,\"del\":3785}]"}
    228 views