# 112-1 Intro to AI ## Homework 6 ###### tags:`1121iai` TA: 蘇蓁葳 Authors: 吳東祈 [d07b48002@ntu.edu.tw](mailto:d07b48002@ntu.edu.tw) and 蘇蓁葳 This is the homework 6 of __人工智慧概論 / Introduction to Artificial Intelligence.__ Department of Biomechatronics Engineering, National Taiwan University. ## Introduction Deep learning (DL) is a machine learning technique that __teaches computers to learn by example__. The key to DL's success is that it uses artificial neural networks to learn complex patterns in data. DL has one critical problem, though - it needs __a lot of__ data to work well, and on many tasks, that amount of data just isn’t available. ### Large language model One of the most powerful models in the era of DL is large language models (LLMs). They are machine learning models trained on a __massive amount__ of text data to either generate human-like text or perform language-related tasks. These models are designed to __understand__ and __generate__ text in a way that mimics human language patterns and structures and can be thought of as the next generation after more traditional natural language processing (NLP) capabilities. The architecture of an LLM varies depending on the specific implementation. However, most LLMs use a __transformer__-based architecture, which is a deep learning architecture that was first introduced in the paper ["Attention is All You Need"](https://doi.org/10.48550/arXiv.1706.03762) in 2017. ### Transfer learning Before 2016, most people randomly initialized a new model for each task they wanted to train them on. This meant that **models had to learn everything they needed to know just from the examples in the training data.** In the process of solving any non-trivial task, neural networks need to learn a lot of the structure of the input data, and thus, training a model needs a lot of data. When two huge papers landed in 2018, introducing the models [ULMFiT](https://arxiv.org/abs/1801.06146) and later [BERT](https://arxiv.org/abs/1810.04805). These were the first papers that got transfer learning in natural language to work really well, and BERT, in particular, marked the beginning of the era of pre-trained large language models. The key reason transfer learning works is that **a lot of what you need to know to solve a task is not specific to that task!** By picking a task where training data is abundant, we can get a neural network to learn that sort of “domain knowledge” and then later apply it to new tasks we care about, where training data might be a lot harder to come by. ### ESM-2 SOTA general-purpose protein language model. It can be used to predict structure, function, and other protein properties directly from individual sequences. Released with [Lin et al. 2022.](https://www.science.org/doi/abs/10.1126/science.ade2574) ## To-Do and homework policy Please refer to the [Google Colab notebook](https://colab.research.google.com/drive/1v5ngm4TDCFW55C9r5eXv3DDiJlgIdaQu?usp=sharing) for detailed instructions. ### TODO #1 (On original dataset) (5%) - Run the reference code for inferring with the fine-tuned model (1, `trainer_finetuned`). - Check your result on the hugging face website! > Remember to sign up for hugging face ### TODO #2 (10%) - Simply load the checkpoint configuration instead of the pre-trained model to train, and eveluate the model on both training dataset and testing dataset (2, `trainer_config`). > You can use the `AutoConfig` to load the config from the model checkpoint, and use the function `AutoModelForSequenceClassification.from_config()` to load the config > Use `trainer.evaluate()` to evaluate the data - Using the original model weights to evaluation on both training data and testing data, record the accuracy respectively (3, `trainer_pretrained`). - Compare and Report the results for three models. (e.g. 1 > 2 > 3) ### TODO #3 (On New dataset) (20%) - Explore the SCOP dataset (number of classes, number of instances in each class) - Create plots to summarize the dataset in the colab notebook. ### TODO #4 (40%) - Complete the code for classification > Note: The class numbers are different - Calculate the F1 score for each class - Explained what is the F1 score you use in multiple class classification in colab notebook. > More about F1 score can be refer at https://huggingface.co/spaces/evaluate-metric/f1/blob/main/f1.py ### TODO #5 (15%) - Copy the learning curve from the Tensor Board on the hugging face. Paste the result on the colab notebook. - Try different epochs - Observe and Report the result in colab notebook. ### TODO #6 (10%) - Try different learning rate - Try a simpler model (esm2_t6_8M_UR50D) - Observe and Report the result in colab notebook. ### Submission - All the results should be in the Colab notebook. - Download your .ipynb file. - Please keep the execution output record of the colab notebook. - Submit to NTU Cool. - Please Submit as the below format: ``` # In lowercase {student_id}_hw6.ipynb # Example b09611048_hw6.ipynb ``` - Deadline: 12/26 (Tues.) 23:59 ## Reference 1. [Deep learning with proteins](https://huggingface.co/blog/deep-learning-with-proteins) 2. [Introduction to Large Language Models](https://medium.com/the-llmops-brief/introduction-to-large-language-models-9ac028d34732) ## Acknowledgments Thanks to the TA team and Professor CYC.