# **RECCOMENDER SYSTEM: MATCHING CLOTHES**
:::success
### **A. OVERVIEW**
:::
##### <span class="green">**1. TEAM MEMBERS:**</span>
Full name | Gmail Address | CoTAI Class
:---: | :---: | :---:
Tran Chanh Hy | hychanhtran@gmail.com | ML4AI
- **Colab Notebook:** [Colab Notebook](https://colab.research.google.com/drive/1S1QKNU2sbaQZzXk2FtNS0PkWNbaWlJ2D?usp=sharing)
- **Github:** [Final-Project-ML4AI](https://hackmd.io/_uploads/S1m06abQC.png)
- **Video demo:** [Coming soon]()
#### <span class="green">**2. PROJECT INFORMATION:**</span>
- **Task** $\mathcal T$: suggest clothes fit each other.
- **Experience** $\mathcal E$:
- *Amazon_Metadata_2018:* [Meta Data On Fashion](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/)
- *FashionVC:* [Data On Matching Tops and Bottoms](https://github.com/wangyu-ustc/PairFashionExplanation/tree/main)
- *Indico Pretrained Model:* [Clothes Matching Pretrained Model](https://indicodata.ai/blog/fashion-matching-tutorial/)
- *Img2Vec Pretrained Model:* [Vectorizing images Pretrained Model](https://github.com/christiansafka/img2vec)
- **Function space** $\mathcal F$: `CNN`, `kMeans`, `kNN`
- **Performance** $\mathcal P$: `cross-entropy loss | accuracy`
- **Algorithm** $\mathcal A$:
- Prediction: `cosine_similarity` +`thresholding`
#### <span class="green">**3. MASTER PLAN**</span>
Timeframe |Work |Progress|Note|
:---------------------:|:----------------------------------|:------:|:--:|
April 26th - April 27th|Searching for dataset |Done |3 datasets|
April 28th - May 04th |Preprocessing dataset |Done |
May 05th - May 06th |Search for model: Vectorize images |Done |CNN + kNN|
May 06th - May 08th |Build CNN: Category classification |Working |For RecSys|
May 08th - May 09th |Build kNN: Find mathching items |Working |Same category|
May 09th - May 11th |Build Streamlit Interface |Working |
May 11th - May 13th |Prepare PPT presentation + ReadMe |Working |
:::success
### **B. PRE-PROCESSING DATA**
:::
- **Dataset source:**
- [meta_Clothing_Shoes_and_Jewelry_2018](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/)
- [FashionVC](https://github.com/wangyu-ustc/PairFashionExplanation/tree/main)
- **Dataset Drive:** [Drive](https://drive.google.com/drive/folders/1IRR5t_YsvdAuKIRagbfxvm0ZUk1TSpQM?usp=drive_link)
(add shortcut to get access to datasets).
#### <span class="green">**1. PROCESSED DATA INFO:**</span>
```python!
RangeIndex: 52456 entries, 0 to 61514
Data columns (total 4 columns):
0 asin 52456 non-null object
1 title 52456 non-null object
2 imageURLHighRes 52456 non-null object
3 also_buy 52456 non-null object
4 category 52456 non-null object
5 ground_truth 52456 non-null object
```
#### <span class="green">**2. LOAD CLEAN DATA:**</span>
```python
drive.mount('/content/drive')
df = pd.read_csv("/content/drive/MyDrive/Dataset/clean_Dataset.csv")
```
:::success
### **C. MACHINE LEARNING ALGORITHM**
:::
- **Reference:** [Matching Clothes For E-commerce](https://www.youtube.com/watch?v=tLE7EoCKaBw&list=LLPe1RJhd3nJM9zarwgJMVPA&index=2&t=211s)
- **General framework:**
$$
\begin{CD}
~~\text{Product Image} @>\text{Img2Vec}>\text{or CNN}> \text{Image Embeddings}@>\text{Cosine Similarity}>\text{kNN}> \text{Product - to - product matrix} \\
@. @V\text{kMeans}V\text{andor MLP}V @V\text{Threshold}VV\\
@. \text{Category} @.\text{Prediction}
\end{CD}
$$
#### <span class="green">**1. FEATURE EXTRACTION**</span>
Try to **`vectorize`** image by **pretrained model** first then I myself will try self-made **`CNN`**:
- **Pretrained model:** [PyTorch Img2Vec](https://github.com/christiansafka/img2vec)
- **Model architecture:**
```python
conv1 = nn.Conv2d(3, 64, kernel_size = 7, stride = 2, padding = 3, bias = False)
bn1 = nn.BatchNorm2d(64)
relu = nn.ReLU(inplace = True)
maxpool = nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1)
layer1 = self._make_layer(block, 64, layers[0])
layer2 = self._make_layer(block, 128, layers[1], stride = 2)
layer3 = self._make_layer(block, 256, layers[2], stride = 2)
layer4 = self._make_layer(block, 512, layers[3], stride = 2)
avgpool = nn.AvgPool2d(7)
fc = nn.Linear(512 * block.expansion, num_classes)
```
- **Load `pretrained_model_embeddings`:**
```python
with open('/content/drive/MyDrive/Dataset/pretrained_model_embeddings.pkl', 'rb') as f:
pre_embeddings = pickle.load(f)
```
><span class="green">**IMPROVEMENT:**</span>
- Build up an **`CNN`** model with activation **`softmax`** for **feature extraction**:
```python
Model: "sequential_6"
___________________________________________________________
Layer (type) Output Shape Param #
===========================================================
Rescaling (None, 512, 512, 3) 0
Conv2D (None, 512, 512, 64) 1728
Conv2D (None, 512, 512, 64) 36864
BatchNormalization (None, 512, 512, 64) 256
Activation: ReLu (None, 512, 512, 64) 0
MaxPooling2D (None, 256, 256, 64) 0
Dropout (None, 256, 256, 64) 0
Conv2D (None, 256, 256, 32) 2048
Conv2D (None, 256, 256, 128) 102400
BatchNormalization (None, 256, 256, 128) 512
Activation: ReLu (None, 256, 256, 128) 0
MaxPooling2D (None, 128, 128, 128) 0
Dropout (None, 128, 128, 128) 0
Conv2D (None, 128, 128, 64) 8192
Conv2D (None, 128, 128, 256) 409600
BatchNormalization (None, 128, 128, 256) 1024
Activation: ReLu (None, 128, 128, 256) 0
```
- **Load** `CNN_embeddings`:
```python
with open('/content/drive/MyDrive/Dataset/CNN_embeddings.pkl', 'rb') as f:
CNN_embeddings = pickle.load(f)
```
#### <span class="green">**2. CLASSIFICATION INTO CATEGORIES**</span>
##### <span class="green">**$\longrightarrow~$ KMEANS ALGORITHM**</span>
- **Support library:**
- *KMeans:* [sklearn.cluster.KMeans](https://plotly.com/python/plotly-express/)
- *PCA:* [sklearn.decomposition.PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)
- *Plotly:* [plotly.express](https://plotly.com/python/plotly-express/)
- **Apply `PCA`** to convert `embeddings` into 3-dimensions:
```python
n_dimension = 3
pca = PCA().fit(embeddings)
embedding_pca = pca.transform(embeddings)[:, 0:n_dimension]
```
$\boxed{\text{MSE for PCA: }~0.98764}$
- **Visualize** the clustering:
|
:---:|
The dataset is best divided into **7 categories**.
- **Visualize** images:
|
:---:|:---:
Accessory: good classification| Shoes: good classification
| |
:---:|:---:|
Things are not clothes: good classification| 3 Mix clusterings: bad classification|
- Upon general observation, `kMeans` seems to treat classes equally, therefore it is not good for this problem.
- However, we can see that the dataset includes things are not clothes. Let's see how `MLP` treat this.
##### <span class="green">**$\longrightarrow~$ MLP ALGORITHM**</span>
- **Split** dataset into training set, testing set, validation set:
```python
x_train, x_test, y_train, y_test = train_test_split(all_Images, ground_truths,
stratify = ground_truths,
test_size = 0.4, shuffle = True,
random_state = 42)
```
- **Create a `MLP` prediction layer** using `softmax` for classification:
```python
Model: "sequential"
_______________________________________________
Layer Output Shape Param #
===============================================
Dense (None, 512, 64) 32832
Dense (None, 512, 32) 2080
Dense (None, 512, 7) 231
===============================================
Total params: 35143 (137.28 KB)
Trainable params: 35143 (137.28 KB)
Non-trainable params: 0 (0.00 Byte)
```
- **Fit** the model with training data using:
- loss = `binary_crossentropy`
- optimizer = `adam`
- metrics = `accuracy`
- validation = `test_set`
```python
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics='accuracy')
history = model.fit(PIL_image, ground_truth,
validation_data = (x_test, y_test), epochs = 5, verbose = 1)
```
- **Plot** **`accuracy | loss`** through `epoch`:
|
:---:|
#### <span class="green">**3. CREATE SIMILARITY_MATRIX**</span>
- Support library:
- *Cosine_Similarity*: [sklearn.metrics.pairwise](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html)
- *kNN*: [kNN]()
- *Heatmap*: [seaborn](https://seaborn.pydata.org/generated/seaborn.heatmap.html)
- **Caculate** the `cosine_similarity` between products:
```python
product_to_product_similarity_matrix = cosine_similarity(embeddings)
```
- **Load** cosine_similarity:
```python
with open('/content/drive/MyDrive/Dataset/Pretrained_model_cosine_similarity.pkl', 'rb') as f:
product_to_product_similarity_matrix = pickle.load(f)
```
- **Visualize** the similarity between products:
<span class="red">Heatmap here</span>
- Now, to see matching items with a piece of clothing, let's follow the **prediction pipe:**
- Upload an *image* to test - in this case, I will pick random an image to test.
- Find the 5 nearest items by `KNN`.
- Check the `cosine_similarity` between the uploaded image and the `also_buy` of the 5 items:
$$
\boxed{\text{Similar items to other products }
\left \lbrace
\begin{array}{ll}
matching & \text{if similarity} > \text{threshold} \\
skip & \text{if similarity} < \text{threshold}
\end{array}
\right.}
$$
- Apply **`kNN`** to see the most similar item:
```python
KNN = KNeighborsClassifier(n_neighbors = 5)
KNN.fit(embeddings, also_buy)
```
#### <span class="green">**4. EVALUATE MODEL**</span>
- Gained score:
-- | MLP | kMeans |
:---:|:---:|:---:|
**CNN** | |
**Img2Vec** | 0.83 | 0.24
- Evaluating with pretrained model: [Indico](https://indicodata.ai/blog/fashion-matching-tutorial/)
<span class="red">Going to be updated soon</span>
<style>
.green {color: green;}
</style>
<style>
.red {color: red;}
</style>

---
|
:---:|
$$
\boxed{
\begin{CD}
\text{Sequence} \\
@VVV \\
\text{Image} @>\text{GradCAM}>> \text{Depth Indicator} \\
@VV\text{DCNN}V \\
\text{Embedding} \\
@VV\text{Object Detection}V \\
\text{Bounded Box} \\
@VV\text{Motion Detection}V \\
\text{Skeleton Joint} \\
@VV\text{RNN}V \\
\text{Safety score} @>\text{Thresholding}>> \text{Drowning warning}
\end{CD}}
$$
|
:---:|
**Bounded box** indicates the position of human.
|
:---:|
**Instance Segmentation** subtracts human with the BG.
|
:---:|
**Demo**: Input = bounded box, Output = skeleton


**Demo**: Input = bounded box, Output = skeleton
|
:---:|
**Wave detection**
|
:---:|
**Intergrated Depth detection**

trọng lượng nâng
pin loại gì
sài cánh nào thì như nào
d x pitch (xoắn nghiêng - ko care)
góc nghiêng giống hộp số -> góc nghiêng càng lớn, hộp số càng nhỏ (kỹ thuạt hàng không) angle of attack (góc tấn)