--- title: engineering-meetings hackmd_link: https://hackmd.io/@understanding-search/engineering-meetings --- # engineering-meetings # Week 1 ## Agenda - Codebase tour - High level goals - Status updates - Week 2 plans ## Notes ### Configuration management resources - Hydra - Weights & Biases - Big(?) open question: what is our config management strategy going forward? ### High-level goals - We don't have strong confidence in our roadmap yet because a lot of it will depend on the results of experiments in the next 1-2 weeks. Some possibilities: - Get a trained model: may be less trivial than we had initially thought. So we may need to train a lot of models. - Training and logging infrastructure - Training & evaluation pipeline - Might be high priority to use weights & biases for training & evaluation - We definitely want to improve how we are handling configs, this may be pending completion of ZANJ - We will definitely be implementing a bunch of interpretability and visualization techniques, timeline and details TBD. - There's always work to be done fixing bugs, adding tests etc. ### Status updates - Everyone on the team has got the codebase set up locally and most have done a small PR! 🥳 - There are no remaining blockers to making the repo public ### Week 2 goals - To discuss early next week - need to wait and see how experiments go - For now we can focus on any TODO issues on the board - **Update:** Our main focus this week is on enabling the research team to train a successful model. In our meeting with the research team we identified several issues that are currently making this hard, and we've started working on them. ### Next actions - [x] - Arrange meeting to discuss training & evaluation infrastructure - [x] - Create status update doc (or slack channel if we get pro plan) - [x] - Set up github integration for slack - [ ] - Create project board readme - [x] - Discuss week 2 goals by Tuesday latest # Week 2 ## Agenda - Status updates - Week 2 plans ## Notes - Test coverage map? Test by branches? Dashboard to indicate health of codebase? - Question for Wandb - We have integration test now that looks at loss logs after training - can we use WB for this? - We could have E2E tests leveraging WB but not in CI to evaluate training perf - External contributors? Doing a bigger demo notebook would help onboard them. # Week 3 ## Agenda - Status updates - Michael - Dan - Rusheb - Lucia - Can - Chris - Alex - Look at board - current priorities? - In-flight - Any pain points from research team? - Non-urgent but high value work - eg features we will need later or big refactorings - Rusheb - cfg.device bug https://searchingforsearch.slack.com/archives/C04S0P93BNE/p1679579393252949 - Week 4 plans ## Notes - Michael - ZANJ serialization - Dan - batched model evals - Rusheb - WandB PR is up - Logging demo - Questions about value of stuff in summary - Lucia - truncation wrapped up, looking for work - Can - Wrapping up as_img PR, moving on to more visualizations of attention - Alex - merged big PR, might need a ticket for some tests - Office hours - Do we need more? - Should Alex do some? - Priority - Eval stuff - Resuming runs - Model sharing - Alex: Might be some complexity here due to W&B not using ConfigHolder - Maybe ConfigHolder should use muutils to serialize/load (Micael can work on this) - Alex: When loading a model, should be able to use current workflow, or pass a link to a W&B run or a link to a local run - Configs? - Handling multiple tokenization schemes? ## Action items - [ ] Make ticket to resume a run from Wandb checkpoint - [ ] Make ticket to improve ease of imports from maze_transformer - inits? - maybe ides/vscode can import things automatically? - [ ] Create ticket for multiple tokenization schemes - [ ] Start a conversation about config refactoring and tokenization schemes on related tickets - [ ] Create ticket for adding x,y tokenization scheme (for Lucia to work on) - [ ] Create label for big-picture discussion issues