- did grad school at CMU and currently doing a PhD in LLMs for low resource programing languages (probabilistic programs, cobol, ansible...). - lead training engineering effort in model post-training at the MIT-IBM AI lab. Our mistral [fine tune](https://huggingface.co/ibm/merlinite-7b) is better in many ways than their own mistral instruct v2. - lots of expertise in large scale distributed training of models at maximum hardware util using pytorch's FSDP or Deepspeed, plus flash attention and Multipack for padding minimization. This also comes with a deep understanding of the "magic" to get models to learn the data. And to find/filter/improve the data you need for your use case. CoT, self-instruct, and many other things can be found in our paper. - Developed the first true way to "merge" knowledge into a centralized model, and we launched [InstructLab](https://github.com/instructlab), an open-source project that enables anyone with no ML experience to contribute. This is the paper. [https://arxiv.org/abs/2403.01081](https://arxiv.org/abs/2403.01081 "https://arxiv.org/abs/2403.01081") - Back in 2018, when transformers where not as cool, I also implemented a transformer from scratch in Julia using the Flux library, and my Jax implementation is deeply inspired by it. - Deep expertise with jax for general scientific computing: In this [research repo](https://github.com/aldopareja/CNF-diff-probprog "https://github.com/aldopareja/CNF-diff-probprog") I’ve explored automatic amortization of any probabilistic program in [NumPyro](https://num.pyro.ai/en/latest/getting_started.html "https://num.pyro.ai/en/latest/getting_started.html"). I used  [Equinox](https://docs.kidger.site/equinox/ "https://docs.kidger.site/equinox/") to build a transformer-based amortization engine that can be combined with virtually any modern generative model architecture: [RealNVP](http://arxiv.org/abs/1605.08803), [Continuous Normalizing Flows](https://arxiv.org/abs/2203.10335), [Diffussion Models](https://arxiv.org/abs/2011.13456) and more. I used Jax because its JIT compiler enables unorthodox computational graphs in GPUs, effectively handling more nuanced scientific computing loads with a much more elegant functional base when compared to pytorch. - My [first published paper](https://www.semanticscholar.org/paper/EvolveGCN%3A-Evolving-Graph-Convolutional-Networks-Pareja-Domeniconi/362e416c5f55b056a6c5930d55d8e3588efce9b9 "https://www.semanticscholar.org/paper/EvolveGCN%3A-Evolving-Graph-Convolutional-Networks-Pareja-Domeniconi/362e416c5f55b056a6c5930d55d8e3588efce9b9") has 1000+ citations. - I was ranked 12th in Colombia’s version of the SAT (among >140.000 students) - I got a [scholarship](https://uniandes.edu.co/sites/default/files/Quiero-Estudiar-2016_0.pdf "https://uniandes.edu.co/sites/default/files/Quiero-Estudiar-2016_0.pdf") in the best university in Colombia to pursue electrical engineering. - I was in the top 10 electrical engineering scorers in Colombia’s graduate SAT (all graduates from Colombian universities need to present the exam) - I also hold a degree in economics and I’m deeply inspired by observing and analyzing how the decentralized nature of humanity drives us forward into a chaotic and mesmerizing future!.