[TOC]
# Data Visualization的作業專案
## 專案目標:
本專案的測試資料皆來自CIFAR-100,該網站有各式分類好得圖片,而專案的目標是將圖片分類並利用折線圖或點狀圖呈現,並且讓使用者能與其互動。

[測試集來源](https://www.cs.toronto.edu/~kriz/cifar.html)
## **以下為Dash Board上的各個互動式圖表:**
### Part1:將圖片群化作點狀圖
**第一張圖就是整個互動式介面,使用者點擊"Select the type"來選擇要顯示哪種圖(ex:PCA, Lime explanation...)**
我們將不同的物件以"離海的距離"與"位置的海拔高度"來分類,例如船隻就是離海遠且海拔低,使用者可以點擊座標上的點來知道該點是指哪個物件。


### Part2:顯示不同Layer的PCA圖
PCA(Principal Component Analysis)是一種常見的數據降維技術,它的目的是在減少數據集的維度的同時保留主要的信息,讓我們更好地理解和分析數據。使用者可以選擇卷積網路中不同的Layer來查看各個Layer的PCA結果為何,可以看出Layer5的PCA比起Layer1的還要有明確的傾向,代表訓練結果越來越接近期望。


### Part3:顯示指定圖片的LIME解釋圖
LIME(Local Interpretable Model-agnostic Explanations)是一種用於解釋機器學習模型預測的方法,它可以顯示出被模型認為重要的圖形特徵。我們將其顯示在DashBoard上。

### Part4:顯示指定圖片的分類機率分布
機率分布圖用於查看模型對於該物件的分類情形,以下圖為例,模型認為該物件高機率是類別四,其次為類別六,結果為正確。辨識錯誤時,我們也能機率分布圖來查看機率次高的類別是否才是正確的那個,藉此得知訓練是否接近正確。

# 實作的程式碼
### **Part1:宣告以及引入資料**
```python=
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.utils import to_categorical
from keras.optimizers import Adam
import numpy as np
from lime import lime_image
from skimage.segmentation import mark_boundaries
from skimage.segmentation import slic
from skimage.color import gray2rgb
from lime.wrappers.scikit_image import SegmentationAlgorithm
from sklearn.decomposition import PCA
from dash import Dash, dcc, html, Input, Output
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import dash_bootstrap_components as dbc
import pickle
import pandas as pd
# Load CIFAR-10 data
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()
# Preprocess the data
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0
# One-hot encode the labels
train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)
```
### **Part2:利用資料集訓練**
使用Sequential作為本次訓練模型
```python=
# Model definition
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
# Model compilation
model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
# Model training ##這裡可以改epoch
history = model.fit(train_images, train_labels, epochs=1, batch_size=64, validation_data=(test_images, test_labels))
```
### **Part3:處理用於製作圖形的資料**
將圖片的陣列轉換成符合Plotly所需要的格式,並為各個類別定義符合該類別的X,Y座標,以利於座標圖的顯示。
```python=
def unpickle(file):
with open(file, 'rb') as fo:
data = pickle.load(fo, encoding='bytes')
return data
batch_file = 'data_batch_1' #However,we don't use data_batch_1 anymore. We use CIFAR-10 for exercise 4
batch_data = unpickle(batch_file)
data = batch_data[b'data']
labels = batch_data[b'labels']
final_data=[]
for i1 in range(36): #Ajust data into specific size
new_data=[]
for i in range(1024):
new_data.append([data[i1][i],data[i1][i+1024],data[i1][i+2048]])
sec_data=[]
for i in range(32):
sec_data.append(new_data[32*i:32+32*i])
final_data.append(sec_data)
data = {'Distance from sea (m)': [], 'Distance from ground (m)': [], 'Class': [], 'index': []}
data2 = {'Distance from sea (m)': [], 'Distance from ground (m)': [], 'Class': [], 'index': []}
classes = ['Class 1', 'Class 2', 'Class 3', 'Class 4', 'Class 5','Class 6', 'Class 7','Class 8','Class 9','Class 10']
X = [-10, -100, -120, -300, -125, -127, -155, -300, 150, -200, -350, -306, -289, -345, -125, -128, -136, -208, -338, -18, -311, -211, -15, -18, -178, -16, -220, -225, -305, 0, -100, -126, -130, -210, -315, 100]
Y = [80, 100, 100, 150, 100, 100, 80, 200, 0, 100, 103, 200, 200, 103, 100, 100, 100, 100, 103, 80, 156, 100, 68, 75, 89, 70, 115, 110, 150, 1000, 1000, 100, 100, 115, 150, 1000]
indexx=0
for class_label in labels[0:36]:
data['Distance from sea (m)'].append(X[indexx])
data['Distance from ground (m)'].append(Y[indexx])
data['Class'].append(classes[class_label])
data['index'].append(indexx)
indexx+=1
indexx=0
for class_label in labels[0:36]:
data2['Distance from sea (m)'].append(X[indexx])
data2['Distance from ground (m)'].append(Y[indexx])
data2['Class'].append(int(class_label)+1)
data2['index'].append(indexx)
indexx+=1
df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
```
### **Part4:建立Dash Board(作圖工具)**
設定Dash Board的layout,建立Update()來不斷的更新圖表,讓使用者可以操作圖表上可變功能。
```python=
app = Dash(__name__, external_stylesheets=[dbc.themes.VAPOR])
app.layout = dbc.Container([ #setting the layout of graph
dbc.Row([
dbc.Col([
html.H1("DVAI-Ex4"),
dcc.Graph(id='dynamic-plot'),
dcc.Dropdown(
id='plot-type-dropdown',
options=[{'label': 'Select the type', 'value': 'Select the type'}] + [{'label': p, 'value': p} for p in ['Scatter Plot', 'Clickable Scatter Plot', 'Bar Plot']],
value='Select the type',
clearable=False),
html.Label('Point size (mm):'),
dcc.Slider(
id='visualization-slider',
min=0.5,
max=2,
step=0.1,
value=1,
marks={i/10: str(i/10) for i in range(5, 21)},
tooltip={'placement': 'bottom'}),
html.Label('Image (index):'),
dcc.Slider(
id='image-slider',
min=0,
max=len(final_data) - 1,
step=1,
value=0,
marks={i: str(i) for i in range(len(final_data))},
tooltip={'placement': 'bottom'}),
dcc.Graph(id='image-display')])]),
dbc.Row([
dbc.Col([
html.Div(
html.H2("PCA", style={'textAlign': 'center', 'width': '100%', 'margin': '10px 0', 'color': 'yellow', 'fontSize': '40px'}),
style={'width': '100%', 'textAlign': 'center'}
),
dcc.Dropdown(
id='pca-dropdown',
options=[{'label': 'Select the type', 'value': 'Select the type'}] + [{'label': p, 'value': p} for p in ['Layer1', 'Layer2', 'Layer3', 'Layer4', 'Layer5']],
value='Layer1',
clearable=False),
dcc.Graph(id='pcaimage-display')])]),
dbc.Row([
dbc.Col([
html.Div(
html.H2("LIME explanation", style={'textAlign': 'center', 'width': '100%', 'margin': '10px 0', 'color': 'yellow', 'fontSize': '40px'}),
style={'width': '100%', 'textAlign': 'center'}
),
dcc.Graph(id='salimage-display')], width=6),
dbc.Col([
html.Div(
html.H2("Probability", style={'textAlign': 'center', 'width': '100%', 'margin': '10px 0', 'color': 'yellow', 'fontSize': '40px'}),
style={'width': '100%', 'textAlign': 'center'}
),
dcc.Graph(id='proimage-display')], width=10)])
])
@app.callback( #callback is for updating the graph
Output('pcaimage-display', 'figure'),
[Input('pca-dropdown', 'value')]
)
def update_pca_display(selected_pca_type):
if selected_pca_type == 'Layer1':
fig_pca = pca_show(0)
elif selected_pca_type == 'Layer2':
fig_pca = pca_show(1)
elif selected_pca_type == 'Layer3':
fig_pca = pca_show(2)
elif selected_pca_type == 'Layer4':
fig_pca = pca_show(3)
elif selected_pca_type == 'Layer5':
fig_pca = pca_show(4)
return fig_pca
@app.callback(
[Output('dynamic-plot', 'figure'), Output('image-display', 'figure'), Output('salimage-display', 'figure'), Output('proimage-display', 'figure'), Output('image-slider', 'value')],
[Input('plot-type-dropdown', 'value'), Input('dynamic-plot', 'clickData'), Input('visualization-slider', 'value'), Input('image-slider', 'value')]
)
def update_dynamic_plot(selected_plot_type, click_data, slider_value, image_index):
if selected_plot_type == 'Clickable Scatter Plot':
fig = px.scatter(df2, x='Distance from sea (m)', y='Distance from ground (m)', color='Class', title='Clickable Scatter Plot')
if click_data is not None:
clicked_index = click_data['points'][0]['pointNumber']
image_index = clicked_index
elif selected_plot_type == 'Scatter Plot':
fig = px.scatter(df, x='Distance from sea (m)', y='Distance from ground (m)', color='Class', title='Scatter Plot', category_orders={'Class': classes})
elif selected_plot_type == 'Bar Plot':
class_counts = df['Class'].value_counts().reset_index()
class_counts.columns = ['Class', 'The number of each class']
class_counts = class_counts.sort_values(by='Class', key=lambda x: x.map({k: i for i, k in enumerate(classes)}))
fig = px.bar(class_counts, x='Class', y='The number of each class', color='Class', title='Bar Plot')
else:
fig = px.scatter(df, x='Distance from sea (m)', y='Distance from ground (m)', color='Class', title='Scatter Plot', category_orders={'Class': classes})
if selected_plot_type in ['Scatter Plot', 'Clickable Scatter Plot', 'Select the type']:
fig.update_traces(marker=dict(size=10 * slider_value))
fig2 = px.imshow(np.array(test_images[image_index]))
fig3 = limeee(image_index)
fig4 = predicting(image_index)
return fig, fig2, fig3, fig4, image_index
```
### **Part5:利用Lime來解釋模型如何提取特徵**
```python=
def limeee(num):
# Choose an image for explanation
image_to_explain = test_images[num]
# Create a LIME explainer for image data
explainer = lime_image.LimeImageExplainer()
segmenter = SegmentationAlgorithm('quickshift', kernel_size=1, max_dist=200, ratio=0.2)
# Explain the model's prediction using LIME
explanation = explainer.explain_instance(image_to_explain, model.predict, top_labels=3, hide_color=0, num_samples=1000,segmentation_fn=segmenter)
# Display the LIME explanation with boundaries using Plotly Express
temp, mask = explanation.get_image_and_mask(explanation.top_labels[2], positive_only=True, num_features=20, hide_rest=False)
img_boundry = mark_boundaries(temp / 2 + 0.5, mask, color=(1, 1, 0))
fig_boundaries = px.imshow(img_boundry)
fig_boundaries.update_layout(title_text="LIME Explanation with Boundaries")
fig_boundaries.update_xaxes(visible=False)
fig_boundaries.update_yaxes(visible=False)
return fig_boundaries
```
### **Part6:顯示PCA圖**
視覺化模型的降維數據,並新增下拉式選單讓使用者觀看不同Layer的PCA圖
```python=
def pca_show(num):
# Get layer names dynamically
layer_names = [layer.name for layer in model.layers if 'conv2d' in layer.name or 'dense' in layer.name]
# Get activations of the selected layers
activations = [model.get_layer(layer_name).output for layer_name in layer_names]
# Create a model that outputs the activations of the selected layers
activation_model = keras.models.Model(inputs=model.input, outputs=activations)
activations_list = activation_model.predict(test_images)
# Apply PCA to each set of activations
activations_pca = [PCA(n_components=2).fit_transform(activation_set.reshape(activation_set.shape[0], -1)) for activation_set in activations_list]
activation_pca = activations_pca[num]
fig = px.scatter(
x=activation_pca[:, 0],
y=activation_pca[:, 1],
color=np.argmax(test_labels, axis=1),
labels={'color': 'Classes'},
title=f'PCA Visualization of {layer_names[num]} Activations')
fig.update_layout(
xaxis_title='Principal Component 1',
yaxis_title='Principal Component 2',
)
return fig
if __name__ == '__main__':
app.run_server(port=8054)
```
### **Part7:顯示模型分類的機率分布圖**
```python=
def predicting(num):
# Make predictions on the selected image
predictions = model.predict(test_images[num].reshape((1, 32, 32, 3)))
predicted_class = np.argmax(predictions)
true_class = np.argmax(test_labels[num]) # Assuming you are using the second image (index 1) from the test set
# Check if the prediction is correct
is_correct = predicted_class == true_class
# Display the true class and prediction status
fig_true_class = go.Figure()
fig_true_class.add_trace(go.Scatter(x=[0], y=[0], mode='text', text=[f"True Class: {true_class+1}\n\nPredicted Class: {predicted_class+1}\n\nCorrect Prediction: {is_correct}"]))
fig_true_class.update_layout(title_text='True Class and Prediction Status')
fig_true_class.update_xaxes(visible=False)
fig_true_class.update_yaxes(visible=False)
# Visualize the prediction distribution
fig_prediction_dist = px.bar(x=list(range(1,11)), y=predictions.flatten(), color_discrete_sequence=['blue'], opacity=0.7, labels={'x': 'Class', 'y': 'Probability'})
fig_prediction_dist.add_shape(
dict(
type='line',
x0=true_class,
x1=true_class,
y0=0,
y1=1,
line=dict(color='red', dash='dash', width=1)
)
)
fig_prediction_dist.update_layout(title_text='Prediction Distribution')
fig_prediction_dist.update_xaxes(title_text='Class')
fig_prediction_dist.update_yaxes(title_text='Probability')
# Combine the subplots using make_subplots
fig = make_subplots(rows=2, cols=1, subplot_titles=['True Class and Prediction Status', 'Prediction Distribution'])
fig.add_trace(fig_true_class.data[0], row=1, col=1)
fig.add_trace(fig_prediction_dist.data[0], row=2, col=1)
fig.update_layout(height=600, title_text="")
return fig
```