Machine Learning Internship

# Machine Learning Internship Interns --- - Burhanuddin Rangwala - Vedant Mahadik Work 1: Model validation ------ Author/s: - Burhanuddin Rangwala - Vedant Mahadik #### :link: Issue: [#166](https://github.com/Auto-DL/Auto-DL/issues/166) #### :link: PR: [#202](https://github.com/Auto-DL/Auto-DL/pull/202) Description: Added a layer level validation check in the frontend which checks whether the inserted and/or edited layer does not create any shape related issue, and if it does returns an array for further client side validation. Blockers: - The Step2.js file vast and it was difficult to understand. - There are two names for many layers example ZeroPadding layer as ZeroPadding2d and ZeroPad2d which may cause some errors as we are accessing the dimensions using layer name. - LSTM may have different output dimensions depending return sequence. Takeaway: Learnt about the interior workings of model compilations and layer shapes. Suggestions for future: - Add a model level validation check which just iterates through all the layer level validation checks. - Also make a separate file for all the json objects. Work 2: Centralize JSON object ------ Author/s: - Burhanuddin Rangwala - Vedant Mahadik #### :link: Issue: [#211](https://github.com/Auto-DL/Auto-DL/issues/211) #### :link: PR: [#234](https://github.com/Auto-DL/Auto-DL/pull/234) Description: Make a centralized json object for layers, hyperparameters, preprocessing, optimizers and loss function details. This is to reduce the codebase of Step2.js in the future. Blockers: - Step2.js has temp_pre_meta which is same for both keras and pytorch so that is left in Step2.js Takeaway: Learnt about the importance of structure in codebase Work 3: Remove local object from validation.js ---- Author/s: - Burhanuddin Rangwala - Vedant Mahadik #### :link: Issue: [#211](https://github.com/Auto-DL/Auto-DL/issues/211) #### :link: PR: [#234](https://github.com/Auto-DL/Auto-DL/pull/234) Description: Make use of centralized json object for layers in validation.js instead of the local object. Work 4: Create a recommendation model for layers ---- Author/s: - Burhanuddin Rangwala - Vedant Mahadik Description: Suggest user the next layer most suitable for the model given the current configurations. Plan: - Use githubs sdk to scrape the models created by other users in open source and use these models to predict/suggest user the next layer. - Create a new repository and have all the models there, then send the most suitable model to the main repository. Work 5: Basic structure and pipeline and write scripts to run the code. ---- Author/s: - Burhanuddin Rangwala - Vedant Mahadik #### :link: Issue: [#1](https://github.com/Auto-DL/Recommendation-System/issues/1) #### :link: PR: [#3](https://github.com/Auto-DL/Recommendation-System/pull/3) Description: - Basic structure and work flow for getting urls of repositories containing tensorflow models using GitHub API and then scraping them with selenium. - Create a script to execute the code from command line to be used on server. Blockers: - Selenium takes upto 2.5 sec per file/tree for scraping. - GitHub API has a limit so cant scrape multiple months. Takeaway: Learnt to scrape websites and use of github api Work 6: Writing tests for scraping ---- Author/s: - Burhanuddin Rangwala - Vedant Mahadik #### :link: Issue: [#4](https://github.com/Auto-DL/Recommendation-System/issues/4) #### :link: PR: [#10](https://github.com/Auto-DL/Recommendation-System/pull/10) Description: Writing tests using pytest for model scraping functions and pickling code function. Blockers: - Alot functions do not return values. - Many functions are nested so testing was complex. - Many functions are interdepenedent so unit testing is not efficient. Takeaway: Learnt pytest Work 7: Add support for ipynb files ---- Author/s: - Burhanuddin Rangwala - Vedant Mahadik #### :link: Issue: [#8](https://github.com/Auto-DL/Recommendation-System/issues/8) #### :link: PR: [#12](https://github.com/Auto-DL/Recommendation-System/pull/12) Description: Add ability to scrape files that are written in ipy notebooks. Blockers: - ipy notebooks take longer to load so selenium has to wait for longer before it starts scraping Work 8: Workflow for Pytest ---- Author/s: - Burhanuddin Rangwala - Vedant Mahadik **For main repository** #### :link: Issue: [#273](https://github.com/Auto-DL/Auto-DL/issues/273) #### :link: PR: [#286](https://github.com/Auto-DL/Auto-DL/pull/286) **For recommendation repository** #### :link: Issue: [#14](https://github.com/Auto-DL/Recommendation-System/issues/14) #### :link: PR: [#15](https://github.com/Auto-DL/Recommendation-System/pull/15) Description: Make use of github actions to run tests on each PR. Takeaway: Learnt github actions Work 9: Use multiprocessing for collecting data ---- Author/s: - Burhanuddin Rangwala - Vedant Mahadik #### :link: Issue: [#16](https://github.com/Auto-DL/Recommendation-System/issues/16) #### :link: PR: [#17](https://github.com/Auto-DL/Recommendation-System/pull/17) Description: Get url links using GitHub API then use multiprocessing for scraping data from repositories. Takeaway: Learnt multiprocessing and use of AWS servers Work 10: Create a proposal doc ---- # Proposal Document ### Introduction:- The problem at hand is the automated real-time prediction of the next layer in the model to assist the user into making a no-code/code tensorflow/pytorch model. ### Proposed Solution:- Use **RNN's (LSTM)** to create a sequential model to predict the next layer in the model. For now the model will be a generic sequential model but will be converted into **SOTA** model after future research. The model will be created using pytorch as its data handling capabilities will make the workflow smoother and make it easier to work with the pickle files. ### Constraints:- 1. The data that we have might be limited. (Needs to be checked how much data has been collected for the 1 month test run on AWS) 2. The data might not be eclectic or might straight up be ersatz or wrong which would lead the model to not be effacious. 3. The model needs to be running and making prediction on a server potentially for multiple users so the size needs to be small for quick response times. ### Alternate Solution:- Try to keep the model rendered in the client side by using TensorFlow-JS. This could potentially lead to a time save. But could also hinder the user experience since it will take 5-6 seconds to initially load the model. We could use async-await as a work around for this. Even for inference it will most likely be the same solution as for the loading thing. Though this will make it so that no data is passed back and forth between the servers and everything happens in the client side. This could be faster than sending the data to the server side making an infernence over there and sending the prediction back to client side. Though the inference time could be faster on a dedicated server than the web browser (some browsers might not support TF-JS), this could offset the time saved by TF-JS. We could not find benchmarks for this. ### Conclusion:- Both the **ML-OPS** architectures will need to be tested to get the better one. Though we propose we do this in the future after the MVP is made. Work 11: Create Model ---- Author/s: - Burhanuddin Rangwala - Vedant Mahadik #### :link: Issue: [#18](https://github.com/Auto-DL/Recommendation-System/issues/18) #### :link: PR: [#20](https://github.com/Auto-DL/Recommendation-System/pull/20) Description: Made a model with the data collected using scraping scripts. Work 12: Create API endpoint for model ---- Author/s: - Burhanuddin Rangwala - Vedant Mahadik #### :link: Issue: [#19](https://github.com/Auto-DL/Recommendation-System/issues/19) #### :link: PR: [#21](https://github.com/Auto-DL/Recommendation-System/pull/21) Description: Made a model with the data collected using scraping scripts. Takeaway: Learnt how to use fastapi. Work 13: Integrate model with main app ---- Author/s: - Burhanuddin Rangwala - Vedant Mahadik **For main repository** #### :link: Issue: [#310](https://github.com/Auto-DL/Auto-DL/issues/310) #### :link: PR: [#313](https://github.com/Auto-DL/Auto-DL/pull/313) **For recommendation repository** #### :link: PR: [#24](https://github.com/Auto-DL/Recommendation-System/pull/24) Description: Soved CORS and tested model with the main app. #### :link: [Task List & Status](https://trello.com/b/Wc9NtL9r) ###### tags: `Summer` `Internship` `Book` `2021`