[TOC] # Data Visualization的作業專案 ## 專案目標: 本專案的測試資料皆來自CIFAR-100,該網站有各式分類好得圖片,而專案的目標是將圖片分類並利用折線圖或點狀圖呈現,並且讓使用者能與其互動。 ![image](https://hackmd.io/_uploads/ryQ4NF936.png) [測試集來源](https://www.cs.toronto.edu/~kriz/cifar.html) ## **以下為Dash Board上的各個互動式圖表:** ### Part1:將圖片群化作點狀圖 **第一張圖就是整個互動式介面,使用者點擊"Select the type"來選擇要顯示哪種圖(ex:PCA, Lime explanation...)** 我們將不同的物件以"離海的距離"與"位置的海拔高度"來分類,例如船隻就是離海遠且海拔低,使用者可以點擊座標上的點來知道該點是指哪個物件。 ![image](https://hackmd.io/_uploads/SkoYltq2T.png) ![image](https://hackmd.io/_uploads/SkXplYq3T.png) ### Part2:顯示不同Layer的PCA圖 PCA(Principal Component Analysis)是一種常見的數據降維技術,它的目的是在減少數據集的維度的同時保留主要的信息,讓我們更好地理解和分析數據。使用者可以選擇卷積網路中不同的Layer來查看各個Layer的PCA結果為何,可以看出Layer5的PCA比起Layer1的還要有明確的傾向,代表訓練結果越來越接近期望。 ![image](https://hackmd.io/_uploads/SJaxbKq26.png) ![image](https://hackmd.io/_uploads/BybC2t9nT.png) ### Part3:顯示指定圖片的LIME解釋圖 LIME(Local Interpretable Model-agnostic Explanations)是一種用於解釋機器學習模型預測的方法,它可以顯示出被模型認為重要的圖形特徵。我們將其顯示在DashBoard上。 ![image](https://hackmd.io/_uploads/Hks7-Fcn6.png) ### Part4:顯示指定圖片的分類機率分布 機率分布圖用於查看模型對於該物件的分類情形,以下圖為例,模型認為該物件高機率是類別四,其次為類別六,結果為正確。辨識錯誤時,我們也能機率分布圖來查看機率次高的類別是否才是正確的那個,藉此得知訓練是否接近正確。 ![image](https://hackmd.io/_uploads/BJZPWKch6.png) # 實作的程式碼 ### **Part1:宣告以及引入資料** ```python= import keras from keras.datasets import cifar10 from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense from keras.utils import to_categorical from keras.optimizers import Adam import numpy as np from lime import lime_image from skimage.segmentation import mark_boundaries from skimage.segmentation import slic from skimage.color import gray2rgb from lime.wrappers.scikit_image import SegmentationAlgorithm from sklearn.decomposition import PCA from dash import Dash, dcc, html, Input, Output import plotly.express as px from plotly.subplots import make_subplots import plotly.graph_objects as go import dash_bootstrap_components as dbc import pickle import pandas as pd # Load CIFAR-10 data (train_images, train_labels), (test_images, test_labels) = cifar10.load_data() # Preprocess the data train_images = train_images.astype('float32') / 255.0 test_images = test_images.astype('float32') / 255.0 # One-hot encode the labels train_labels = to_categorical(train_labels, 10) test_labels = to_categorical(test_labels, 10) ``` ### **Part2:利用資料集訓練** 使用Sequential作為本次訓練模型 ```python= # Model definition model = Sequential() model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3))) model.add(MaxPooling2D((2, 2))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D((2, 2))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(Flatten()) model.add(Dense(64, activation='relu')) model.add(Dense(10, activation='softmax')) # Model compilation model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy']) # Model training ##這裡可以改epoch history = model.fit(train_images, train_labels, epochs=1, batch_size=64, validation_data=(test_images, test_labels)) ``` ### **Part3:處理用於製作圖形的資料** 將圖片的陣列轉換成符合Plotly所需要的格式,並為各個類別定義符合該類別的X,Y座標,以利於座標圖的顯示。 ```python= def unpickle(file): with open(file, 'rb') as fo: data = pickle.load(fo, encoding='bytes') return data batch_file = 'data_batch_1' #However,we don't use data_batch_1 anymore. We use CIFAR-10 for exercise 4 batch_data = unpickle(batch_file) data = batch_data[b'data'] labels = batch_data[b'labels'] final_data=[] for i1 in range(36): #Ajust data into specific size new_data=[] for i in range(1024): new_data.append([data[i1][i],data[i1][i+1024],data[i1][i+2048]]) sec_data=[] for i in range(32): sec_data.append(new_data[32*i:32+32*i]) final_data.append(sec_data) data = {'Distance from sea (m)': [], 'Distance from ground (m)': [], 'Class': [], 'index': []} data2 = {'Distance from sea (m)': [], 'Distance from ground (m)': [], 'Class': [], 'index': []} classes = ['Class 1', 'Class 2', 'Class 3', 'Class 4', 'Class 5','Class 6', 'Class 7','Class 8','Class 9','Class 10'] X = [-10, -100, -120, -300, -125, -127, -155, -300, 150, -200, -350, -306, -289, -345, -125, -128, -136, -208, -338, -18, -311, -211, -15, -18, -178, -16, -220, -225, -305, 0, -100, -126, -130, -210, -315, 100] Y = [80, 100, 100, 150, 100, 100, 80, 200, 0, 100, 103, 200, 200, 103, 100, 100, 100, 100, 103, 80, 156, 100, 68, 75, 89, 70, 115, 110, 150, 1000, 1000, 100, 100, 115, 150, 1000] indexx=0 for class_label in labels[0:36]: data['Distance from sea (m)'].append(X[indexx]) data['Distance from ground (m)'].append(Y[indexx]) data['Class'].append(classes[class_label]) data['index'].append(indexx) indexx+=1 indexx=0 for class_label in labels[0:36]: data2['Distance from sea (m)'].append(X[indexx]) data2['Distance from ground (m)'].append(Y[indexx]) data2['Class'].append(int(class_label)+1) data2['index'].append(indexx) indexx+=1 df = pd.DataFrame(data) df2 = pd.DataFrame(data2) ``` ### **Part4:建立Dash Board(作圖工具)** 設定Dash Board的layout,建立Update()來不斷的更新圖表,讓使用者可以操作圖表上可變功能。 ```python= app = Dash(__name__, external_stylesheets=[dbc.themes.VAPOR]) app.layout = dbc.Container([ #setting the layout of graph dbc.Row([ dbc.Col([ html.H1("DVAI-Ex4"), dcc.Graph(id='dynamic-plot'), dcc.Dropdown( id='plot-type-dropdown', options=[{'label': 'Select the type', 'value': 'Select the type'}] + [{'label': p, 'value': p} for p in ['Scatter Plot', 'Clickable Scatter Plot', 'Bar Plot']], value='Select the type', clearable=False), html.Label('Point size (mm):'), dcc.Slider( id='visualization-slider', min=0.5, max=2, step=0.1, value=1, marks={i/10: str(i/10) for i in range(5, 21)}, tooltip={'placement': 'bottom'}), html.Label('Image (index):'), dcc.Slider( id='image-slider', min=0, max=len(final_data) - 1, step=1, value=0, marks={i: str(i) for i in range(len(final_data))}, tooltip={'placement': 'bottom'}), dcc.Graph(id='image-display')])]), dbc.Row([ dbc.Col([ html.Div( html.H2("PCA", style={'textAlign': 'center', 'width': '100%', 'margin': '10px 0', 'color': 'yellow', 'fontSize': '40px'}), style={'width': '100%', 'textAlign': 'center'} ), dcc.Dropdown( id='pca-dropdown', options=[{'label': 'Select the type', 'value': 'Select the type'}] + [{'label': p, 'value': p} for p in ['Layer1', 'Layer2', 'Layer3', 'Layer4', 'Layer5']], value='Layer1', clearable=False), dcc.Graph(id='pcaimage-display')])]), dbc.Row([ dbc.Col([ html.Div( html.H2("LIME explanation", style={'textAlign': 'center', 'width': '100%', 'margin': '10px 0', 'color': 'yellow', 'fontSize': '40px'}), style={'width': '100%', 'textAlign': 'center'} ), dcc.Graph(id='salimage-display')], width=6), dbc.Col([ html.Div( html.H2("Probability", style={'textAlign': 'center', 'width': '100%', 'margin': '10px 0', 'color': 'yellow', 'fontSize': '40px'}), style={'width': '100%', 'textAlign': 'center'} ), dcc.Graph(id='proimage-display')], width=10)]) ]) @app.callback( #callback is for updating the graph Output('pcaimage-display', 'figure'), [Input('pca-dropdown', 'value')] ) def update_pca_display(selected_pca_type): if selected_pca_type == 'Layer1': fig_pca = pca_show(0) elif selected_pca_type == 'Layer2': fig_pca = pca_show(1) elif selected_pca_type == 'Layer3': fig_pca = pca_show(2) elif selected_pca_type == 'Layer4': fig_pca = pca_show(3) elif selected_pca_type == 'Layer5': fig_pca = pca_show(4) return fig_pca @app.callback( [Output('dynamic-plot', 'figure'), Output('image-display', 'figure'), Output('salimage-display', 'figure'), Output('proimage-display', 'figure'), Output('image-slider', 'value')], [Input('plot-type-dropdown', 'value'), Input('dynamic-plot', 'clickData'), Input('visualization-slider', 'value'), Input('image-slider', 'value')] ) def update_dynamic_plot(selected_plot_type, click_data, slider_value, image_index): if selected_plot_type == 'Clickable Scatter Plot': fig = px.scatter(df2, x='Distance from sea (m)', y='Distance from ground (m)', color='Class', title='Clickable Scatter Plot') if click_data is not None: clicked_index = click_data['points'][0]['pointNumber'] image_index = clicked_index elif selected_plot_type == 'Scatter Plot': fig = px.scatter(df, x='Distance from sea (m)', y='Distance from ground (m)', color='Class', title='Scatter Plot', category_orders={'Class': classes}) elif selected_plot_type == 'Bar Plot': class_counts = df['Class'].value_counts().reset_index() class_counts.columns = ['Class', 'The number of each class'] class_counts = class_counts.sort_values(by='Class', key=lambda x: x.map({k: i for i, k in enumerate(classes)})) fig = px.bar(class_counts, x='Class', y='The number of each class', color='Class', title='Bar Plot') else: fig = px.scatter(df, x='Distance from sea (m)', y='Distance from ground (m)', color='Class', title='Scatter Plot', category_orders={'Class': classes}) if selected_plot_type in ['Scatter Plot', 'Clickable Scatter Plot', 'Select the type']: fig.update_traces(marker=dict(size=10 * slider_value)) fig2 = px.imshow(np.array(test_images[image_index])) fig3 = limeee(image_index) fig4 = predicting(image_index) return fig, fig2, fig3, fig4, image_index ``` ### **Part5:利用Lime來解釋模型如何提取特徵** ```python= def limeee(num): # Choose an image for explanation image_to_explain = test_images[num] # Create a LIME explainer for image data explainer = lime_image.LimeImageExplainer() segmenter = SegmentationAlgorithm('quickshift', kernel_size=1, max_dist=200, ratio=0.2) # Explain the model's prediction using LIME explanation = explainer.explain_instance(image_to_explain, model.predict, top_labels=3, hide_color=0, num_samples=1000,segmentation_fn=segmenter) # Display the LIME explanation with boundaries using Plotly Express temp, mask = explanation.get_image_and_mask(explanation.top_labels[2], positive_only=True, num_features=20, hide_rest=False) img_boundry = mark_boundaries(temp / 2 + 0.5, mask, color=(1, 1, 0)) fig_boundaries = px.imshow(img_boundry) fig_boundaries.update_layout(title_text="LIME Explanation with Boundaries") fig_boundaries.update_xaxes(visible=False) fig_boundaries.update_yaxes(visible=False) return fig_boundaries ``` ### **Part6:顯示PCA圖** 視覺化模型的降維數據,並新增下拉式選單讓使用者觀看不同Layer的PCA圖 ```python= def pca_show(num): # Get layer names dynamically layer_names = [layer.name for layer in model.layers if 'conv2d' in layer.name or 'dense' in layer.name] # Get activations of the selected layers activations = [model.get_layer(layer_name).output for layer_name in layer_names] # Create a model that outputs the activations of the selected layers activation_model = keras.models.Model(inputs=model.input, outputs=activations) activations_list = activation_model.predict(test_images) # Apply PCA to each set of activations activations_pca = [PCA(n_components=2).fit_transform(activation_set.reshape(activation_set.shape[0], -1)) for activation_set in activations_list] activation_pca = activations_pca[num] fig = px.scatter( x=activation_pca[:, 0], y=activation_pca[:, 1], color=np.argmax(test_labels, axis=1), labels={'color': 'Classes'}, title=f'PCA Visualization of {layer_names[num]} Activations') fig.update_layout( xaxis_title='Principal Component 1', yaxis_title='Principal Component 2', ) return fig if __name__ == '__main__': app.run_server(port=8054) ``` ### **Part7:顯示模型分類的機率分布圖** ```python= def predicting(num): # Make predictions on the selected image predictions = model.predict(test_images[num].reshape((1, 32, 32, 3))) predicted_class = np.argmax(predictions) true_class = np.argmax(test_labels[num]) # Assuming you are using the second image (index 1) from the test set # Check if the prediction is correct is_correct = predicted_class == true_class # Display the true class and prediction status fig_true_class = go.Figure() fig_true_class.add_trace(go.Scatter(x=[0], y=[0], mode='text', text=[f"True Class: {true_class+1}\n\nPredicted Class: {predicted_class+1}\n\nCorrect Prediction: {is_correct}"])) fig_true_class.update_layout(title_text='True Class and Prediction Status') fig_true_class.update_xaxes(visible=False) fig_true_class.update_yaxes(visible=False) # Visualize the prediction distribution fig_prediction_dist = px.bar(x=list(range(1,11)), y=predictions.flatten(), color_discrete_sequence=['blue'], opacity=0.7, labels={'x': 'Class', 'y': 'Probability'}) fig_prediction_dist.add_shape( dict( type='line', x0=true_class, x1=true_class, y0=0, y1=1, line=dict(color='red', dash='dash', width=1) ) ) fig_prediction_dist.update_layout(title_text='Prediction Distribution') fig_prediction_dist.update_xaxes(title_text='Class') fig_prediction_dist.update_yaxes(title_text='Probability') # Combine the subplots using make_subplots fig = make_subplots(rows=2, cols=1, subplot_titles=['True Class and Prediction Status', 'Prediction Distribution']) fig.add_trace(fig_true_class.data[0], row=1, col=1) fig.add_trace(fig_prediction_dist.data[0], row=2, col=1) fig.update_layout(height=600, title_text="") return fig ```