Rust ML WG Meeting 00011

# Rust ML WG Meeting 00011 [![hackmd-github-sync-badge](https://hackmd.io/pzG1QdqOTymDiyxtPZnyJA/badge)](https://hackmd.io/pzG1QdqOTymDiyxtPZnyJA) ## Meeting Info Date: 20201006 Start time: 1600 ET Zoom: https://ucsb.zoom.us/j/6601852842 ## Agenda * Transfer ZuseZ4/datasets to Rust-ML-WG? * Rust ML Optimisation * Research on Optimisation Path (FPGA/Embedded Devices) * CPU version of framework as first deliverable? * SmartCore * https://smartcorelib.org/ * https://www.reddit.com/r/rust/comments/j1mj1g/smartcore_fast_and_comprehensive_machine_learning/ * Review of linfa updates * chrism: updates on arewelearningyet.com * chrism: has been doing some research about open source licensing in machine learning * AndrazT: Anyone interested in on-line learning machinery in rust - like Vowpal Wabbit? * Ricky: Rust ML Meetup 20201127 * https://hackmd.io/7hStiyyHQcCCFGVCJb5pfQ ## Participants * chrism * ricky * andraz * manuel ## Minutes ### Transfer ZuseZ4/datasets to Rust-ML-WG * Datasets repository to Rust ML Group Repository * manuel is interested in transfering that over Zuse24 dataset repository * talking about having a list of datasets or places to go for Rust data loaders for ML set in the group * chrism mentioned that it would be nice to have multiple loaders for the datasets so it is widely availble to people * if it is a specific loader for something we might want to just link to it ### Rust ML Optimisation * jamal is looking into optimisations with FPGAs and GPUs * for a lot of FPGAs with open source we're not there yet * cool project called symbiflow * https://symbiflow.github.io/ * open source project that is taking that process and make it simplier * standarizing how to deploy to an FPGA * but that's a ways off * took a step back and looking at CPU optimizations * RISCV * open source instruction set architecture * https://riscv.org/ * wants to start here, anything beyond FPGAs for this is a no-starter since the environment is too new * here he was looking at arrayfire and ndarray a framework for neural networks * as they relate to embedded hardware ### SmartCore * chrism looked at the documentation for the code * docs are really good * mostly classical ML algorithms * bunch of different regressions * decision trees and random forest * nearest neighbor * linfa has one more clustering algorithm * SmartCore might only have k-means * Have not looked into their model evaluation * it is on arewelearningyet.com! ### arewelearningyet.com updates * updated info * direct links to linfa and smartcore * calling them out specifically as meta-repos for doing ML in Rust * we have green on the board! * one category has been updated from red -> yellow * we need to have a sort order to the libraries on arewelearningyet.com * talk with anthony on how to get this done * best sorting order * maybe a function of crates.io downloads and last commit? * datasets on arewelearningyet.com * another category for those * right now there's not enough repositories for adding another category * linfa datasets * https://github.com/rust-ml/linfa/tree/master/datasets * Talking about how crates.io blocked a publish crate with a github dependency * just chatting about the technical aspects around loading a dataset with a crate ### Open Source Licensing in Machine Learning * looking into licensing, ethics, privacy, and everythign around ML * reached out to the group of responsible AI licensesing * not helpful responses from them * nearing the end of the first draft of the blog post on licensing around ML * overview, in-the-weeds, and a good use case of licenses * description of why he feels using Apache2.0 and MIT is not sufficient for the libraries * the distinct lack of responsibility of the technology that we're not comfortable with in publishing a ML library * talked to a lawyer family member * IP lawyer is needed, they said it's magical and very specific * considering writing a first version of a license for open source ML libraries * however writing your own license is very very difficult * if anyone is interested to help review, reach out * he will find a way to send it so we can help out ### Online Learning Machinery in rust * high veloicty machine learning * written implemetnation of regession * online learning * most well known tool is Vowpal Wabbit * written in C++ * heavily specalized and very fast * however, if you narrowdown the problem and use Rust * you can make it 3x faster ;) * they wrote an internal tool to make this very fast * trying to open source it, licensing right now though and it's moving along * why it is interesting * this is written in Rust that others don't have * probably the fastest linear regression on normal CPUs * used internal for data science research * hyperparameter tuning * maxing out at memory bandwidth it's that fast * going to move from internal repository to github * so they are forced to maintain it there * optmized a lot * like instead of gzip using lz4 * downside is it's very narrow in what it takes and gives ### Rust ML Meetup November 27th * Rust ML WG member talks are listed here: * https://hackmd.io/7hStiyyHQcCCFGVCJb5pfQ ## Actions * chrism: talk to anthony about sort order for arewelearningyet.com * ricky: Possibly give some time to Andraz during Rust ML meetup talk to talk about the online learning tool they are open sourcing