RecSys Project - Hii

# **RECCOMENDER SYSTEM: MATCHING CLOTHES** :::success ### **A. OVERVIEW** ::: ##### **1. TEAM MEMBERS:** Full name | Gmail Address | CoTAI Class :---: | :---: | :---: Tran Chanh Hy | hychanhtran@gmail.com | ML4AI - **Colab Notebook:** [Colab Notebook](https://colab.research.google.com/drive/1S1QKNU2sbaQZzXk2FtNS0PkWNbaWlJ2D?usp=sharing) - **Github:** [Final-Project-ML4AI](https://hackmd.io/_uploads/S1m06abQC.png) - **Video demo:** [Coming soon]() #### **2. PROJECT INFORMATION:** - **Task** $\mathcal T$: suggest clothes fit each other. - **Experience** $\mathcal E$: - *Amazon_Metadata_2018:* [Meta Data On Fashion](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/) - *FashionVC:* [Data On Matching Tops and Bottoms](https://github.com/wangyu-ustc/PairFashionExplanation/tree/main) - *Indico Pretrained Model:* [Clothes Matching Pretrained Model](https://indicodata.ai/blog/fashion-matching-tutorial/) - *Img2Vec Pretrained Model:* [Vectorizing images Pretrained Model](https://github.com/christiansafka/img2vec) - **Function space** $\mathcal F$: `CNN`, `kMeans`, `kNN` - **Performance** $\mathcal P$: `cross-entropy loss | accuracy` - **Algorithm** $\mathcal A$: - Prediction: `cosine_similarity` +`thresholding` #### **3. MASTER PLAN** Timeframe |Work |Progress|Note| :---------------------:|:----------------------------------|:------:|:--:| April 26th - April 27th|Searching for dataset |Done |3 datasets| April 28th - May 04th |Preprocessing dataset |Done | May 05th - May 06th |Search for model: Vectorize images |Done |CNN + kNN| May 06th - May 08th |Build CNN: Category classification |Working |For RecSys| May 08th - May 09th |Build kNN: Find mathching items |Working |Same category| May 09th - May 11th |Build Streamlit Interface |Working | May 11th - May 13th |Prepare PPT presentation + ReadMe |Working | :::success ### **B. PRE-PROCESSING DATA** ::: - **Dataset source:** - [meta_Clothing_Shoes_and_Jewelry_2018](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/) - [FashionVC](https://github.com/wangyu-ustc/PairFashionExplanation/tree/main) - **Dataset Drive:** [Drive](https://drive.google.com/drive/folders/1IRR5t_YsvdAuKIRagbfxvm0ZUk1TSpQM?usp=drive_link) (add shortcut to get access to datasets). #### **1. PROCESSED DATA INFO:** ```python! RangeIndex: 52456 entries, 0 to 61514 Data columns (total 4 columns): 0 asin 52456 non-null object 1 title 52456 non-null object 2 imageURLHighRes 52456 non-null object 3 also_buy 52456 non-null object 4 category 52456 non-null object 5 ground_truth 52456 non-null object ``` #### **2. LOAD CLEAN DATA:** ```python drive.mount('/content/drive') df = pd.read_csv("/content/drive/MyDrive/Dataset/clean_Dataset.csv") ``` :::success ### **C. MACHINE LEARNING ALGORITHM** ::: - **Reference:** [Matching Clothes For E-commerce](https://www.youtube.com/watch?v=tLE7EoCKaBw&list=LLPe1RJhd3nJM9zarwgJMVPA&index=2&t=211s) - **General framework:** $$ \begin{CD} ~~\text{Product Image} @>\text{Img2Vec}>\text{or CNN}> \text{Image Embeddings}@>\text{Cosine Similarity}>\text{kNN}> \text{Product - to - product matrix} \\ @. @V\text{kMeans}V\text{andor MLP}V @V\text{Threshold}VV\\ @. \text{Category} @.\text{Prediction} \end{CD} $$ #### **1. FEATURE EXTRACTION** Try to **`vectorize`** image by **pretrained model** first then I myself will try self-made **`CNN`**: - **Pretrained model:** [PyTorch Img2Vec](https://github.com/christiansafka/img2vec) - **Model architecture:** ```python conv1 = nn.Conv2d(3, 64, kernel_size = 7, stride = 2, padding = 3, bias = False) bn1 = nn.BatchNorm2d(64) relu = nn.ReLU(inplace = True) maxpool = nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1) layer1 = self._make_layer(block, 64, layers[0]) layer2 = self._make_layer(block, 128, layers[1], stride = 2) layer3 = self._make_layer(block, 256, layers[2], stride = 2) layer4 = self._make_layer(block, 512, layers[3], stride = 2) avgpool = nn.AvgPool2d(7) fc = nn.Linear(512 * block.expansion, num_classes) ``` - **Load `pretrained_model_embeddings`:** ```python with open('/content/drive/MyDrive/Dataset/pretrained_model_embeddings.pkl', 'rb') as f: pre_embeddings = pickle.load(f) ``` >**IMPROVEMENT:** - Build up an **`CNN`** model with activation **`softmax`** for **feature extraction**: ```python Model: "sequential_6" ___________________________________________________________ Layer (type) Output Shape Param # =========================================================== Rescaling (None, 512, 512, 3) 0 Conv2D (None, 512, 512, 64) 1728 Conv2D (None, 512, 512, 64) 36864 BatchNormalization (None, 512, 512, 64) 256 Activation: ReLu (None, 512, 512, 64) 0 MaxPooling2D (None, 256, 256, 64) 0 Dropout (None, 256, 256, 64) 0 Conv2D (None, 256, 256, 32) 2048 Conv2D (None, 256, 256, 128) 102400 BatchNormalization (None, 256, 256, 128) 512 Activation: ReLu (None, 256, 256, 128) 0 MaxPooling2D (None, 128, 128, 128) 0 Dropout (None, 128, 128, 128) 0 Conv2D (None, 128, 128, 64) 8192 Conv2D (None, 128, 128, 256) 409600 BatchNormalization (None, 128, 128, 256) 1024 Activation: ReLu (None, 128, 128, 256) 0 ``` - **Load** `CNN_embeddings`: ```python with open('/content/drive/MyDrive/Dataset/CNN_embeddings.pkl', 'rb') as f: CNN_embeddings = pickle.load(f) ``` #### **2. CLASSIFICATION INTO CATEGORIES** ##### **$\longrightarrow~$ KMEANS ALGORITHM** - **Support library:** - *KMeans:* [sklearn.cluster.KMeans](https://plotly.com/python/plotly-express/) - *PCA:* [sklearn.decomposition.PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) - *Plotly:* [plotly.express](https://plotly.com/python/plotly-express/) - **Apply `PCA`** to convert `embeddings` into 3-dimensions: ```python n_dimension = 3 pca = PCA().fit(embeddings) embedding_pca = pca.transform(embeddings)[:, 0:n_dimension] ``` $\boxed{\text{MSE for PCA: }~0.98764}$ - **Visualize** the clustering: ![image](https://hackmd.io/_uploads/B12QMMczC.png)| :---:| The dataset is best divided into **7 categories**. - **Visualize** images: ![image](https://hackmd.io/_uploads/Skj52j1QC.png)|![shoes](https://hackmd.io/_uploads/B1-rTjJQR.png) :---:|:---: Accessory: good classification| Shoes: good classification ![drop](https://hackmd.io/_uploads/SyzLRo1X0.png)| ![mix](https://hackmd.io/_uploads/SJE9AoyX0.png)| :---:|:---:| Things are not clothes: good classification| 3 Mix clusterings: bad classification| - Upon general observation, `kMeans` seems to treat classes equally, therefore it is not good for this problem. - However, we can see that the dataset includes things are not clothes. Let's see how `MLP` treat this. ##### **$\longrightarrow~$ MLP ALGORITHM** - **Split** dataset into training set, testing set, validation set: ```python x_train, x_test, y_train, y_test = train_test_split(all_Images, ground_truths, stratify = ground_truths, test_size = 0.4, shuffle = True, random_state = 42) ``` - **Create a `MLP` prediction layer** using `softmax` for classification: ```python Model: "sequential" _______________________________________________ Layer Output Shape Param # =============================================== Dense (None, 512, 64) 32832 Dense (None, 512, 32) 2080 Dense (None, 512, 7) 231 =============================================== Total params: 35143 (137.28 KB) Trainable params: 35143 (137.28 KB) Non-trainable params: 0 (0.00 Byte) ``` - **Fit** the model with training data using: - loss = `binary_crossentropy` - optimizer = `adam` - metrics = `accuracy` - validation = `test_set` ```python model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics='accuracy') history = model.fit(PIL_image, ground_truth, validation_data = (x_test, y_test), epochs = 5, verbose = 1) ``` - **Plot** **`accuracy | loss`** through `epoch`: ![image](https://hackmd.io/_uploads/r1FS7zGQC.png)| :---:| #### **3. CREATE SIMILARITY_MATRIX** - Support library: - *Cosine_Similarity*: [sklearn.metrics.pairwise](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html) - *kNN*: [kNN]() - *Heatmap*: [seaborn](https://seaborn.pydata.org/generated/seaborn.heatmap.html) - **Caculate** the `cosine_similarity` between products: ```python product_to_product_similarity_matrix = cosine_similarity(embeddings) ``` - **Load** cosine_similarity: ```python with open('/content/drive/MyDrive/Dataset/Pretrained_model_cosine_similarity.pkl', 'rb') as f: product_to_product_similarity_matrix = pickle.load(f) ``` - **Visualize** the similarity between products: Heatmap here - Now, to see matching items with a piece of clothing, let's follow the **prediction pipe:** - Upload an *image* to test - in this case, I will pick random an image to test. - Find the 5 nearest items by `KNN`. - Check the `cosine_similarity` between the uploaded image and the `also_buy` of the 5 items: $$ \boxed{\text{Similar items to other products } \left \lbrace \begin{array}{ll} matching & \text{if similarity} > \text{threshold} \\ skip & \text{if similarity} < \text{threshold} \end{array} \right.} $$ - Apply **`kNN`** to see the most similar item: ```python KNN = KNeighborsClassifier(n_neighbors = 5) KNN.fit(embeddings, also_buy) ``` #### **4. EVALUATE MODEL** - Gained score: -- | MLP | kMeans | :---:|:---:|:---:| **CNN** | | **Img2Vec** | 0.83 | 0.24 - Evaluating with pretrained model: [Indico](https://indicodata.ai/blog/fashion-matching-tutorial/) Going to be updated soon <style> .green {color: green;} </style> <style> .red {color: red;} </style> ![image](https://hackmd.io/_uploads/SytphXImC.png) --- ![image](https://hackmd.io/_uploads/B1rF7XFU0.png)| :---:| $$ \boxed{ \begin{CD} \text{Sequence} \\ @VVV \\ \text{Image} @>\text{GradCAM}>> \text{Depth Indicator} \\ @VV\text{DCNN}V \\ \text{Embedding} \\ @VV\text{Object Detection}V \\ \text{Bounded Box} \\ @VV\text{Motion Detection}V \\ \text{Skeleton Joint} \\ @VV\text{RNN}V \\ \text{Safety score} @>\text{Thresholding}>> \text{Drowning warning} \end{CD}} $$ ![image](https://hackmd.io/_uploads/Bkz0smKLC.png)| :---:| **Bounded box** indicates the position of human. ![Screenshot 2024-06-21 141122](https://hackmd.io/_uploads/SJ1eCXY8C.png)| :---:| **Instance Segmentation** subtracts human with the BG. ![Screenshot 2024-06-21 160732](https://hackmd.io/_uploads/Byu40QKIA.png)| :---:| **Demo**: Input = bounded box, Output = skeleton ![Screenshot 2024-06-26 081748](https://hackmd.io/_uploads/SkdHyEKLC.png) ![pose_2024_06_21_16_04_09](https://hackmd.io/_uploads/SJ_-lVY80.png) **Demo**: Input = bounded box, Output = skeleton ![image](https://hackmd.io/_uploads/S1FrW4KLA.png)| :---:| **Wave detection** ![image](https://hackmd.io/_uploads/SJ2ObNFIR.png)| :---:| **Intergrated Depth detection** ![image](https://hackmd.io/_uploads/r1uGM4Y8R.png) trọng lượng nâng pin loại gì sài cánh nào thì như nào d x pitch (xoắn nghiêng - ko care) góc nghiêng giống hộp số -> góc nghiêng càng lớn, hộp số càng nhỏ (kỹ thuạt hàng không) angle of attack (góc tấn)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.