Deep learning for visual understanding: A review. | Guo, Y., Liu, Y., Oerlemans, A., … Neurocomputing, 187, 27–48.

# Deep learning for visual understanding: A review. | Guo, Y., Liu, Y., Oerlemans, A., … Neurocomputing, 187, 27–48. > Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for visual understanding: A review. Neurocomputing, 187, 27–48. https://doi.org/10.1016/j.neucom.2015.09.116 Index Terms—Deep learning; Computer vision; Developments; Applications; Trends; Challenges. ### INTRODUCTION There are mainly three important reasons for the booming of deep learning today: * The dramatically increased chip processing abilities (e.g. GPU units). * The significantly lowered cost of computing hardware. * The considerable advances in the machine learning algorithms. ### Convolutional Neural Networks (CNNs) ![](https://i.imgur.com/WNWnr8I.png) #### Convolutional layers There are three main advantages : * The weight sharing mechanism in the same feature map reduces the number of parameters. * Local connectivity learns correlations among neighboring pixels. * Invariance to the location of the object. ![](https://i.imgur.com/7qNn4pz.png) #### Pooling layers * A pooling layer can be used to reduce the dimensions of feature maps and network parameters. * Convolutional layers and pooling layers computations take neighboring pixels. * Max pooling are the most commonly used strategies. ![](https://i.imgur.com/6G10oCr.png) #### Fully-connected layers * Converting the 2D feature maps into a 1D feature vector. * Contain about 90% of the parameters in a cnn. ![](https://i.imgur.com/Pto92BE.png) ### Training strategy Compared to shallow learning, the advantage of deep learning is that it can build deep architectures to learn more abstract information. However, the large amount of parameters introduced may also lead to another problem: **Overfitting**. ![](https://i.imgur.com/4sU31ap.png) > http://wiki.bethanycrane.com/overfitting-of-data #### Dropout Dropout was proposed by Hinton et al. and explained in-depth by Baldi et al. The algorithm will randomly omit half of the feature detectors in order to prevent complex co-adaptations on the training data and enhance the generalization ability. ![](https://i.imgur.com/swtBFvi.png) A comparison of No-Drop and Dropout networks ![](https://i.imgur.com/MnG0AVH.png) > Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). > Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 1929–1958. > 會成功的原因是透過丟棄網路中的某些神經元，每個mini-batch進入學習並且進行dropout後，都是在訓練原本網路的某種子網路，原本網路就變成是這些子網路的集合體(ensemble)，而測試時不使用dropout，其結果是為這些子網路之期望值，故表現會變好。 ### Auto-Encoder The Auto-Encoder is a special type of artificial neural network used for learning efficient encodings. > 自編碼器是一種具有高效率的編碼方式，而用於學習人工神經網路的自編碼器，目的是學習壓縮，針對一組輸入資料的編碼，典型的目的是達到維數的降低。 > 如圖，分別是輸入層、輸出層與中間的隱藏層，目的是以降低維度的編碼方法找出訓練資料模型，用來進行資料歸類的動作。 > 整體概念就是讓編碼後的特徵，在通過解碼的程序後，能夠還原成與一開始的輸入相同，就像範例圖。 > 如果編碼器學習到的特徵越重要，那還原的結果將會越精確。在自编碼神經網路中，隱藏層的設置是非常重要的，因為隱藏層如果太大，那麼特徵學習會變得沒有意義；相反的，隱藏層太小，將會因為特徵資訊不足導致無法還原成輸入。 > 此種自编碼神經網路，有別於傳統類神經網路輸入X來預測Y，其目標在於學習一個函數，使輸出Y趨近於原始輸入X。同時此方法適用於線性(實數)及非線性(布林)問題，又容易與其他的類神經理論做結合，而且特別適用於資料為實數的問題。 ![](https://i.imgur.com/10xpoc2.png) > Hinton, G. E., & Salakhutdinov, R. R. (2006). > Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507. https://doi.org/10.1126/science.1127647 ###### tags: `deep learning`