Project1 report

# Project1 report #### NYCU Spr2023: AI Capstone Author: 110652019 林楷傑 ## 1. Image Datasets ### Description [Data Source](https://www.cs.toronto.edu/~kriz/cifar.html) The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. ### Data Preprocessing - Data Transformer(To augmenting data) - Random Rotation - Random Horizontal Flip - Show some sample from the dataset ![](https://imgur.com/uHu1Fcn.png) - Train test split - CIFAR10 has helped us accomplish this task - We also don't do cross validation here. - Batch size is 128 ### Algorithms #### Convolutional Neural Network The convolutional layers apply filters to the input image to identify patterns and features, while the pooling layers reduce the dimensionality of the output. - Architecture: Five CNN layers with ReLU and Maxpooling layers. - Hyperparameters and optimizer ```python! epochs = 20 lr = 3e-4 weight_decay = 1e-5 gamma = 0.5 step_size = 10 optimizer = torch.optim.Adam scheduler = torch.optim.lr_scheduler.StepLR ``` - Training - Training 20 epochs with 6 minutes. - Last epoch: val_loss: 0.8691, val_acc: 0.7108 - Model AUC 85.77%, Accuracy 71.04% on Test Data ![](https://imgur.com/OSnUXZl.png =x150) #### Vision Transformer Vision Transformer(ViT) is a attention-based transformer architecture. ViT attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train. - Architecture ![](https://i.imgur.com/1PNfjqR.png =300x) - Hyperparameters and optimizer ```python! epochs = 20 lr = 3e-4 weight_decay = 1e-5 gamma = 0.5 step_size = 10 optimizer = torch.optim.Adam scheduler = torch.optim.lr_scheduler.StepLR ``` - Training - Training 20 epochs with 1 hour. - Last epoch: val_loss: 0.9160, val_acc: 0.7007 - Model AUC 87.77%, Accuracy 70.08% on Test Data ![](https://imgur.com/HZjLVlH.png =x150) ### Analysis - Compare the result when using half of training data. - The training time is cost half of original. - CNN ![](https://imgur.com/FQXeLUV.png =250x) - ViT ![](https://imgur.com/pchvnC3.png =250x) - Compare the results when using different classifiers. - CNN Performance ![](https://imgur.com/nrrEn7M.png =250x) - ViT Performance ![](https://imgur.com/Dpc2Jui.png =250x) - Compare the results when using different settings and/or hyper-parameters for the same classifier. I change learning rate to 1e-3, weight_decay to 1e-6, gamma to 0.5 and step_size to 5. - CNN2 Performance ![](https://imgur.com/nRTgVnP.png =250x) - ViT2 Performance ![](https://imgur.com/Hik3kSt.png =250x) - It seens that the performance would not vary a lot. - Compare with results given in the respective websites or literature if they are available. - Since CIFAR10 is a classic image data source. Many people would expriment with it. - [CNN-based(VGG16)](https://huggingface.co/edadaltocg/vgg16_bn_cifar10) - [Transformer based(ViT)](https://huggingface.co/aaraki/vit-base-patch16-224-in21k-finetuned-cifar10) - Since my computing resource is not enough, I can't reach same high performance like others did. But this experiment let me gain many experiments about ML and DL algorithms. ## 2. Stock price prediction ### Description [Data Source](https://tw.finance.yahoo.com/) [yfinance API](https://pypi.org/project/yfinance/) - yfinance offers a threaded and Pythonic way to download market data from Yahoo!R finance. - Download the stock data of MediaTek(聯發科) from 2018 to 2022 end. ![](https://imgur.com/4iNrzSi.png =x200) ### Data preprocessing - Feature Engineering - Compute the MA value and RMI value - Target is 5 day future close. - Correlation Heatmap ![](https://imgur.com/TtjK7LW.png =300x) ### Algorithms #### Random Forest Regressor - [Use sklearn library](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) - A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. ![](https://images.deepai.org/user-content/9196004107-thumb-1447.svg =x200) #### Linear Regression - [Use sklearn library](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) - LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. ![](https://pimages.toolbox.com/wp-content/uploads/2022/04/07040339/25-4.png =x200) ### Analysis - Training evaluation with Cross Validation - Since the data is continuous, I did not use confusion matrix here. - [Cross Validation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) - Random Forest: ![](https://imgur.com/cacn6Cf.png =x250) - Linear Regression: ![](https://imgur.com/SSW5OJC.png =x250) - Compare the results when using different amounts of training data. - Train size = 70% (Other experiments are 90%) - Random Forest ![](https://imgur.com/5rFKXrN.png =x250) - Linear Regression ![](https://imgur.com/BkGHufT.png =x250) - Compare the results when using different settings and/or hyper-parameters for the same classifier. - Tune different hyper parameter of random forest - (n_estimators, max_depth) = (50, 4) ![](https://imgur.com/3BTcYnI.png =x250) - (n_estimators, max_depth) = (20, 8) ![](https://imgur.com/Skuaei6.png =x250) - (n_estimators, max_depth) = (100, 4) ![](https://imgur.com/5tIeoZj.png =x250) ## 3. Music Classification ### Description - I dowload some EDM from Soundcloud and classify them into two category, which are house music and non-house music. ### Data preprocessing - Feature extraction - Use librosa python library to extract the mfcc features.The MFCC feature extraction technique basically includes windowing the signal, applying the DFT, taking the log of the magnitude, and then warping the frequencies on a Mel scale, followed by applying the inverse DCT. - [librosa](https://librosa.org/doc/latest/index.html) - Correlation Heatmap - We can find that many features has positive correlation with label. ![](https://imgur.com/wCWIxUF.png =x200) ### Algorithms #### Decision Tree Classifier - [Use sklearn library](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier) - A decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and regression tasks. It has a hierarchical, tree structure, which consists of a root node, branches, internal nodes and leaf nodes. #### K_Neighbors Classifier - [Use sklearn library](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) - The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. ### Analysis - Training Evaluation and compare with different model. Using 80% training data (cross validation/precision/recall/F1/AUROC) - kf = KFold(n_splits=10, shuffle=True, random_state=42) - Decision Tree(max_depth=5) ![](https://imgur.com/Zsk0QCM.png =x150) ![](https://imgur.com/rZvsW7l.png =x150) - KNN(n_neighbors=1) ![](https://imgur.com/qXANwKY.png =x150) ![](https://imgur.com/wki00gE.png =x150) - Compare the results when using 70% training data.(Use Decision Tree to illustrate.) - Decision Tree(max_depth=5) ![](https://imgur.com/OG1akT7.png =x150) ![](https://imgur.com/2q1XKxM.png =x150) - Compare the results when using different settings and/or hyper-parameters for the same classifier. - Decision Tree(max_depth=10) ![](https://imgur.com/3874rdb.png =x150) ![](https://imgur.com/0pX44UL.png =x150) - KNN(n_neighbors=10) ![](https://imgur.com/se2nwqI.png =x150) ![](https://imgur.com/h97dmAf.png =x150) ## Discussion - In these experiments. I used CNN and ViT to predict image, random forest and linear regression to predict stock and decision tree and KNN to classify music genre. The result is in my expect since I did not make many improvement to performance. - I think although the algorithm we use is import, data preparing and feature engineering are also very important, too. Good performance models are based on good quality data. - If I have more time to do this project, I would use bigger model and higher quality data to train. I want to train the difference between ResNet and Vision Transfomer with more layers. And combine the CNN with transformers to reach better performance. - This is my first to hand write the transformer architecture, which is pretty interesting. I have learned that collecting data would be exhauting. In the future, I would have appreciate with the data source.