Data Visualization的作業專案

[TOC] # Data Visualization的作業專案 ## 專案目標：本專案的測試資料皆來自CIFAR-100，該網站有各式分類好得圖片，而專案的目標是將圖片分類並利用折線圖或點狀圖呈現，並且讓使用者能與其互動。 ![image](https://hackmd.io/_uploads/ryQ4NF936.png) [測試集來源](https://www.cs.toronto.edu/~kriz/cifar.html) ## **以下為Dash Board上的各個互動式圖表：** ### Part1:將圖片群化作點狀圖 **第一張圖就是整個互動式介面，使用者點擊"Select the type"來選擇要顯示哪種圖(ex:PCA, Lime explanation...)** 我們將不同的物件以"離海的距離"與"位置的海拔高度"來分類，例如船隻就是離海遠且海拔低，使用者可以點擊座標上的點來知道該點是指哪個物件。 ![image](https://hackmd.io/_uploads/SkoYltq2T.png) ![image](https://hackmd.io/_uploads/SkXplYq3T.png) ### Part2:顯示不同Layer的PCA圖 PCA（Principal Component Analysis）是一種常見的數據降維技術，它的目的是在減少數據集的維度的同時保留主要的信息，讓我們更好地理解和分析數據。使用者可以選擇卷積網路中不同的Layer來查看各個Layer的PCA結果為何，可以看出Layer5的PCA比起Layer1的還要有明確的傾向，代表訓練結果越來越接近期望。 ![image](https://hackmd.io/_uploads/SJaxbKq26.png) ![image](https://hackmd.io/_uploads/BybC2t9nT.png) ### Part3:顯示指定圖片的LIME解釋圖 LIME（Local Interpretable Model-agnostic Explanations）是一種用於解釋機器學習模型預測的方法，它可以顯示出被模型認為重要的圖形特徵。我們將其顯示在DashBoard上。 ![image](https://hackmd.io/_uploads/Hks7-Fcn6.png) ### Part4:顯示指定圖片的分類機率分布機率分布圖用於查看模型對於該物件的分類情形，以下圖為例，模型認為該物件高機率是類別四，其次為類別六，結果為正確。辨識錯誤時，我們也能機率分布圖來查看機率次高的類別是否才是正確的那個，藉此得知訓練是否接近正確。 ![image](https://hackmd.io/_uploads/BJZPWKch6.png) # 實作的程式碼 ### **Part1:宣告以及引入資料** ```python= import keras from keras.datasets import cifar10 from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense from keras.utils import to_categorical from keras.optimizers import Adam import numpy as np from lime import lime_image from skimage.segmentation import mark_boundaries from skimage.segmentation import slic from skimage.color import gray2rgb from lime.wrappers.scikit_image import SegmentationAlgorithm from sklearn.decomposition import PCA from dash import Dash, dcc, html, Input, Output import plotly.express as px from plotly.subplots import make_subplots import plotly.graph_objects as go import dash_bootstrap_components as dbc import pickle import pandas as pd # Load CIFAR-10 data (train_images, train_labels), (test_images, test_labels) = cifar10.load_data() # Preprocess the data train_images = train_images.astype('float32') / 255.0 test_images = test_images.astype('float32') / 255.0 # One-hot encode the labels train_labels = to_categorical(train_labels, 10) test_labels = to_categorical(test_labels, 10) ``` ### **Part2:利用資料集訓練** 使用Sequential作為本次訓練模型 ```python= # Model definition model = Sequential() model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3))) model.add(MaxPooling2D((2, 2))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D((2, 2))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(Flatten()) model.add(Dense(64, activation='relu')) model.add(Dense(10, activation='softmax')) # Model compilation model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy']) # Model training ##這裡可以改epoch history = model.fit(train_images, train_labels, epochs=1, batch_size=64, validation_data=(test_images, test_labels)) ``` ### **Part3:處理用於製作圖形的資料** 將圖片的陣列轉換成符合Plotly所需要的格式，並為各個類別定義符合該類別的X,Y座標，以利於座標圖的顯示。 ```python= def unpickle(file): with open(file, 'rb') as fo: data = pickle.load(fo, encoding='bytes') return data batch_file = 'data_batch_1' #However,we don't use data_batch_1 anymore. We use CIFAR-10 for exercise 4 batch_data = unpickle(batch_file) data = batch_data[b'data'] labels = batch_data[b'labels'] final_data=[] for i1 in range(36): #Ajust data into specific size new_data=[] for i in range(1024): new_data.append([data[i1][i],data[i1][i+1024],data[i1][i+2048]]) sec_data=[] for i in range(32): sec_data.append(new_data[32*i:32+32*i]) final_data.append(sec_data) data = {'Distance from sea (m)': [], 'Distance from ground (m)': [], 'Class': [], 'index': []} data2 = {'Distance from sea (m)': [], 'Distance from ground (m)': [], 'Class': [], 'index': []} classes = ['Class 1', 'Class 2', 'Class 3', 'Class 4', 'Class 5','Class 6', 'Class 7','Class 8','Class 9','Class 10'] X = [-10, -100, -120, -300, -125, -127, -155, -300, 150, -200, -350, -306, -289, -345, -125, -128, -136, -208, -338, -18, -311, -211, -15, -18, -178, -16, -220, -225, -305, 0, -100, -126, -130, -210, -315, 100] Y = [80, 100, 100, 150, 100, 100, 80, 200, 0, 100, 103, 200, 200, 103, 100, 100, 100, 100, 103, 80, 156, 100, 68, 75, 89, 70, 115, 110, 150, 1000, 1000, 100, 100, 115, 150, 1000] indexx=0 for class_label in labels[0:36]: data['Distance from sea (m)'].append(X[indexx]) data['Distance from ground (m)'].append(Y[indexx]) data['Class'].append(classes[class_label]) data['index'].append(indexx) indexx+=1 indexx=0 for class_label in labels[0:36]: data2['Distance from sea (m)'].append(X[indexx]) data2['Distance from ground (m)'].append(Y[indexx]) data2['Class'].append(int(class_label)+1) data2['index'].append(indexx) indexx+=1 df = pd.DataFrame(data) df2 = pd.DataFrame(data2) ``` ### **Part4:建立Dash Board(作圖工具)** 設定Dash Board的layout，建立Update()來不斷的更新圖表，讓使用者可以操作圖表上可變功能。 ```python= app = Dash(__name__, external_stylesheets=[dbc.themes.VAPOR]) app.layout = dbc.Container([ #setting the layout of graph dbc.Row([ dbc.Col([ html.H1("DVAI-Ex4"), dcc.Graph(id='dynamic-plot'), dcc.Dropdown( id='plot-type-dropdown', options=[{'label': 'Select the type', 'value': 'Select the type'}] + [{'label': p, 'value': p} for p in ['Scatter Plot', 'Clickable Scatter Plot', 'Bar Plot']], value='Select the type', clearable=False), html.Label('Point size (mm):'), dcc.Slider( id='visualization-slider', min=0.5, max=2, step=0.1, value=1, marks={i/10: str(i/10) for i in range(5, 21)}, tooltip={'placement': 'bottom'}), html.Label('Image (index):'), dcc.Slider( id='image-slider', min=0, max=len(final_data) - 1, step=1, value=0, marks={i: str(i) for i in range(len(final_data))}, tooltip={'placement': 'bottom'}), dcc.Graph(id='image-display')])]), dbc.Row([ dbc.Col([ html.Div( html.H2("PCA", style={'textAlign': 'center', 'width': '100%', 'margin': '10px 0', 'color': 'yellow', 'fontSize': '40px'}), style={'width': '100%', 'textAlign': 'center'} ), dcc.Dropdown( id='pca-dropdown', options=[{'label': 'Select the type', 'value': 'Select the type'}] + [{'label': p, 'value': p} for p in ['Layer1', 'Layer2', 'Layer3', 'Layer4', 'Layer5']], value='Layer1', clearable=False), dcc.Graph(id='pcaimage-display')])]), dbc.Row([ dbc.Col([ html.Div( html.H2("LIME explanation", style={'textAlign': 'center', 'width': '100%', 'margin': '10px 0', 'color': 'yellow', 'fontSize': '40px'}), style={'width': '100%', 'textAlign': 'center'} ), dcc.Graph(id='salimage-display')], width=6), dbc.Col([ html.Div( html.H2("Probability", style={'textAlign': 'center', 'width': '100%', 'margin': '10px 0', 'color': 'yellow', 'fontSize': '40px'}), style={'width': '100%', 'textAlign': 'center'} ), dcc.Graph(id='proimage-display')], width=10)]) ]) @app.callback( #callback is for updating the graph Output('pcaimage-display', 'figure'), [Input('pca-dropdown', 'value')] ) def update_pca_display(selected_pca_type): if selected_pca_type == 'Layer1': fig_pca = pca_show(0) elif selected_pca_type == 'Layer2': fig_pca = pca_show(1) elif selected_pca_type == 'Layer3': fig_pca = pca_show(2) elif selected_pca_type == 'Layer4': fig_pca = pca_show(3) elif selected_pca_type == 'Layer5': fig_pca = pca_show(4) return fig_pca @app.callback( [Output('dynamic-plot', 'figure'), Output('image-display', 'figure'), Output('salimage-display', 'figure'), Output('proimage-display', 'figure'), Output('image-slider', 'value')], [Input('plot-type-dropdown', 'value'), Input('dynamic-plot', 'clickData'), Input('visualization-slider', 'value'), Input('image-slider', 'value')] ) def update_dynamic_plot(selected_plot_type, click_data, slider_value, image_index): if selected_plot_type == 'Clickable Scatter Plot': fig = px.scatter(df2, x='Distance from sea (m)', y='Distance from ground (m)', color='Class', title='Clickable Scatter Plot') if click_data is not None: clicked_index = click_data['points'][0]['pointNumber'] image_index = clicked_index elif selected_plot_type == 'Scatter Plot': fig = px.scatter(df, x='Distance from sea (m)', y='Distance from ground (m)', color='Class', title='Scatter Plot', category_orders={'Class': classes}) elif selected_plot_type == 'Bar Plot': class_counts = df['Class'].value_counts().reset_index() class_counts.columns = ['Class', 'The number of each class'] class_counts = class_counts.sort_values(by='Class', key=lambda x: x.map({k: i for i, k in enumerate(classes)})) fig = px.bar(class_counts, x='Class', y='The number of each class', color='Class', title='Bar Plot') else: fig = px.scatter(df, x='Distance from sea (m)', y='Distance from ground (m)', color='Class', title='Scatter Plot', category_orders={'Class': classes}) if selected_plot_type in ['Scatter Plot', 'Clickable Scatter Plot', 'Select the type']: fig.update_traces(marker=dict(size=10 * slider_value)) fig2 = px.imshow(np.array(test_images[image_index])) fig3 = limeee(image_index) fig4 = predicting(image_index) return fig, fig2, fig3, fig4, image_index ``` ### **Part5:利用Lime來解釋模型如何提取特徵** ```python= def limeee(num): # Choose an image for explanation image_to_explain = test_images[num] # Create a LIME explainer for image data explainer = lime_image.LimeImageExplainer() segmenter = SegmentationAlgorithm('quickshift', kernel_size=1, max_dist=200, ratio=0.2) # Explain the model's prediction using LIME explanation = explainer.explain_instance(image_to_explain, model.predict, top_labels=3, hide_color=0, num_samples=1000,segmentation_fn=segmenter) # Display the LIME explanation with boundaries using Plotly Express temp, mask = explanation.get_image_and_mask(explanation.top_labels[2], positive_only=True, num_features=20, hide_rest=False) img_boundry = mark_boundaries(temp / 2 + 0.5, mask, color=(1, 1, 0)) fig_boundaries = px.imshow(img_boundry) fig_boundaries.update_layout(title_text="LIME Explanation with Boundaries") fig_boundaries.update_xaxes(visible=False) fig_boundaries.update_yaxes(visible=False) return fig_boundaries ``` ### **Part6:顯示PCA圖** 視覺化模型的降維數據，並新增下拉式選單讓使用者觀看不同Layer的PCA圖 ```python= def pca_show(num): # Get layer names dynamically layer_names = [layer.name for layer in model.layers if 'conv2d' in layer.name or 'dense' in layer.name] # Get activations of the selected layers activations = [model.get_layer(layer_name).output for layer_name in layer_names] # Create a model that outputs the activations of the selected layers activation_model = keras.models.Model(inputs=model.input, outputs=activations) activations_list = activation_model.predict(test_images) # Apply PCA to each set of activations activations_pca = [PCA(n_components=2).fit_transform(activation_set.reshape(activation_set.shape[0], -1)) for activation_set in activations_list] activation_pca = activations_pca[num] fig = px.scatter( x=activation_pca[:, 0], y=activation_pca[:, 1], color=np.argmax(test_labels, axis=1), labels={'color': 'Classes'}, title=f'PCA Visualization of {layer_names[num]} Activations') fig.update_layout( xaxis_title='Principal Component 1', yaxis_title='Principal Component 2', ) return fig if __name__ == '__main__': app.run_server(port=8054) ``` ### **Part7:顯示模型分類的機率分布圖** ```python= def predicting(num): # Make predictions on the selected image predictions = model.predict(test_images[num].reshape((1, 32, 32, 3))) predicted_class = np.argmax(predictions) true_class = np.argmax(test_labels[num]) # Assuming you are using the second image (index 1) from the test set # Check if the prediction is correct is_correct = predicted_class == true_class # Display the true class and prediction status fig_true_class = go.Figure() fig_true_class.add_trace(go.Scatter(x=[0], y=[0], mode='text', text=[f"True Class: {true_class+1}\n\nPredicted Class: {predicted_class+1}\n\nCorrect Prediction: {is_correct}"])) fig_true_class.update_layout(title_text='True Class and Prediction Status') fig_true_class.update_xaxes(visible=False) fig_true_class.update_yaxes(visible=False) # Visualize the prediction distribution fig_prediction_dist = px.bar(x=list(range(1,11)), y=predictions.flatten(), color_discrete_sequence=['blue'], opacity=0.7, labels={'x': 'Class', 'y': 'Probability'}) fig_prediction_dist.add_shape( dict( type='line', x0=true_class, x1=true_class, y0=0, y1=1, line=dict(color='red', dash='dash', width=1) ) ) fig_prediction_dist.update_layout(title_text='Prediction Distribution') fig_prediction_dist.update_xaxes(title_text='Class') fig_prediction_dist.update_yaxes(title_text='Probability') # Combine the subplots using make_subplots fig = make_subplots(rows=2, cols=1, subplot_titles=['True Class and Prediction Status', 'Prediction Distribution']) fig.add_trace(fig_true_class.data[0], row=1, col=1) fig.add_trace(fig_prediction_dist.data[0], row=2, col=1) fig.update_layout(height=600, title_text="") return fig ```