TokenBender-Training-Models

## TokenBender's [X thread](https://x.com/tokenbender/status/1966149102812770480) on starting out -> training/RL your own models. Sep 11, 2025 **Pradeep's note**: *I discovered HackMD via [this post](https://x.com/ChinmayKak/status/1966867880164946087), and I wanted to try out using HackMD, and I had TokenBender's thread in another tab, and decided to compile the thread here. Below content belongs to TokenBender. HackMD is pretty Cool!* > #1 - where to start? is nanogpt enough? do i need anything before starting with karpathy video? not really. i think nanogpt and some googling/llm search is all you need to understand the barebones of things. provided you go block diagram level first. > #2 - how much maths do i need? i feel confused reading diffusion or understanding certain mechanisms in transformers. knowing everything helps, you should know what something does as a function. input:output relations for every block. leave deeper questions for round 2 and see e2e first. > #3 - how do i start when i don't even have a GPU? you can simply take an existing basic nanogpt script and reduce config to the smallest numbers - layers/dim/head make it run on kaggle where you get quite a bit of VRAM and free access for starting out. > #4 - what do i do when i do not understand something? i can't move forward i get lost in searching meaning of terms. just ask llms but keep things on the level of a visitor who is out there to give name to every code block first. this chunk expects abc, does xyz. that's all. now once you are familiar with the territory, you can go more granular and take things one at a time. but the dopamine juice of reaching the end must flow first. > #5 - LLMs don't generate end to end code correctly, and experts don't teach for my basic needs you need toy snippets of everything - basic dense, moe, sft, all RL algos, quantization, inference. everything is on github. find respected codebases and ask models to use them as gold references. > #6 - i bookmark 200 posts a day, always in FOMO, i feel i am missing everything needed for a possible interview cut that stuff out. the daily decrease will save you, not the daily increase. just write down one small objective to learn and work on things that are one google/llm search + 1 click away to get started. if you do not know of any, ask here on twt, many would answer. > #7 - where should i start? you'd have to decide that for yourself first. but i suggest picking a stack - easy to hard based on availability and cost. agent/api layer -> inference optimisation -> post training (SFT + RL) -> pretraining (just nanogpt) learn standard methods on every layer. reduce your job to only the configs and learn to find how to be code agnostic at every layer first. agent -> what context gets my job done? inference -> popular vllm applications, avoid kernel layer as a beginner post training -> what rubrics/envs do i write for RL? (it is just functions and scripts) and what dataset should i use for SFT? pretraining -> what makes dataset A better then dataset B? which architecture are used in sota releases? what optimizer/lr everyone likes? which hparams matter if i lock the arch and optimizer. and so on. after you're done with this, you'd realise anything else that remains is basically a process of discovery for newer solutions. sota is always formidable in DL and you can't find alternatives without first experiencing the problems first so avoid that unless you clearly know what you dislike after you have seen the lay of the land. > #8 - how to read papers? there are so many papers everyday? if you already know the basics, you can just pick papers in the stack you picked as mentioned above. only stick to familiarising yourself with the pain points and what everyone is trying to solve. after some time, each paper would be a delta to you. i.e. this paper adds an xyz twist to abc and compares with everything that is its peer. and enough familiarity with an area means you can mentally reduce every paper to a block diagram or a line in a function with the memory of what happened and why? > #9 - but i only want to work on hot shit. but hot shit requires compute and kernels and mech interp and i can't be sexy talking about api, data, evals and i would look like a jobber. you can be useful anywhere. learn to be useful on scale first. then you would start seeing that none of that really matters and is a syndrome of liking something that has unavailability or access-gating bias. > #10 - what do i do if i want to write sota RL? read this before to get familiar with methods but don't drown in it, learn it like a caveman who can answer - useful or not useful to terms. [Understanding Reinforcement Learning for Model Training by Rohit Patel dt Aug 18, 2025](https://drive.google.com/file/d/1q8664fOAUTqz1JHPzbvtNgMvVlqwU_77/view?pli=1) then you can pick unsloth notebooks and just search for the portion where some rubrics are being created and slightly modify one thing and see if you broke stuff or if it still works. you need to feel for if the model will learn or not learn based on your rewards. just understand a task, what the model would need to learn for it, how to write rubrics for that and which rewards were bad and which were good and why. the latter portion of this, is quite valuable and is a fruit of experience. so you need to pick tasks - math, tool calling. and then tweak stuff around. once you get the rewards based off of a working notebook. then understand training stability and which RL algorithms like what hparams for stability. what contributes, what breaks things? but in all the cases, have a golden working toy reference ad then change one thing at a time there. write it down. if something makes sense, good. if it does not, ask on twt or to your fav LLM. > #11 - there are so many posts, blogs and i do not know where to start? half the time someone is selling something to me. stick to oss first. there are only a couple of things to learn if you don't choke yourself with frameworks. you should know that everything is a loop and some screws and wrench to avoid things falling off. so pick a blog or post that gets you in and out the fastest but it is something that you can write in your own GPU or kaggle or google colab easily. if you can't, throw it away, there is no lack of grifters in this field who would farm you everyday by sharing 20 screenshots and cracked alpha from-scratch every week. nobody can save you from that but you only need to suffer for a short while to distinguish signal from noise. pick for things with action and verifiability bias. what you can't code and test today, throw it away. > #12 - what is nanogpt for ft/rl? RL2 repo is good that also serves for education and production. i don't recommend large frameworks to be used like n as nanogpt for RL. > anything to get started for jailbreak or red teaming? you wouldn't find a nanogpt equivalent for this but you'd find many tools for this. you want to find a survey paper to start on this first and then find appropriate scripts/prompt references online for each. https://arxiv.org/abs/2404.00629