1131112 - Getting Started with Deep Learning

## 1. Introduction to Deep Learning 深度學習入門課程 <details> <summary>摘要</summary> ### 準備好嘗試了嗎？讓我們開始吧。 **Ready to give it a try? Let's get started.** --- ### 開始介紹 **Hi, my name is Danielle and I'm a content developer here at NVIDIA.** 大家好，我叫 Danielle，是 NVIDIA 的內容開發人員。 --- ## 深度學習的挑戰與價值 --- 研究人工智慧可能非常具有挑戰性，但也非常有價值。 **Studying AI could be pretty challenging, but it's also very rewarding.** 不僅可以了解機器人如何思考，但也可以學習一些技巧來幫助人類成為更有效率的學習者。 **Not only can we learn about how robots think, but we can also pick up a few tricks to help humans be more efficient learners too.** 這就是第一堂課的內容。 **That's what this first lecture is going to be about.** 興奮的。我也是。我們開始吧。 **Excited. Me too. Let's get going.** --- ### 技術配置與步驟好的，我的電腦在這兒，攝影機在這兒，在我們進入理論之前我們將先討論一些技術問題。 **OK so I have my computer down here and the cameras up there, and we're going to go over some technical things before we jump into the theory.** 在這段影片下面，您應該會看到類似的內容…右上角有一個開始按鈕。 **So below this video you should hopefully see something like this, with a start button in the top right.** - **單擊「開始」以喚醒 GPU** - **查看幻燈片的副本** 最酷的一點是，您可以按此鍵來擴展它 ‘開啟一個新視窗’。 **What's really cool is that you can expand it by pressing this 'Open a new window'.** --- ### 投影片與筆記的使用在右下角… 您也可以按一下註釋。這是第一張投影片。它沒有任何註釋。 **In the bottom right here… You can also click on the notes. This is the first slide. It doesn't have any notes.** --- ## 本課程的目標 --- 人工智慧和機器學習？它們是巨大的領域。這門課程有點像一個品嚐者，一個對即將發生的事情的預告。 **Artificial intelligence and machine learning? They're huge fields. This course is kind of like a taster, a teaser of what’s to come.** --- ### 訓練過程中的模型探索好吧，當 GPU 喚醒時，它會看起來像這樣。 **Alright, so when the GPU wakes up, it's going to look something like this.** 當我單擊此啟動任務按鈕時, 它將帶我到 Jupiter 筆記本。 **When I click on this launch task button, it's going to take me to the Jupyter notebook.** 在左邊是我們的檔案目錄。繼續按字母順序閱讀。 **On the left here, we have our directory of files. Go ahead and read them in alphabetical order.** --- ## 什麼是深度學習？深度學習顛覆了傳統程式設計。 **Deep learning flips traditional programming around.** --- ### 深度學習的特點深度學習與傳統機器學習的區別之一是網路的深度和複雜性。 **One thing that separates deep learning from traditional machine learning is the depth and complexity of the networks used for especially complicated tasks.** 深度學習的「深」指的是模型的許多層，這些層學習完成任務所需的重要規則。 **The ‘deep’ in deep learning refers to many layers of the model that learn the important rules necessary for completing a task.** --- ## 深度學習在現代的影響深度學習在過去十年左右成為了一個主要因素，對許多行業產生了巨大影響。 **Deep learning has only been a major factor in the last decade or so, but it has had a huge impact in many industries.** --- ### 深度學習的實際應用例如, 高度準確的即時語言翻譯, 語音辨識和虛擬協助。 **For example, on highly accurate real-time language translation, voice recognition, and virtual assistance.** 基於深度學習的推薦系統透過 Facebook 等精選內容、Spotify 等音樂和 YouTube 影片為網路提供動力。 **Recommender systems based on deep learning fuel the Internet with curated content feeds like Facebook, music like Spotify, and video like YouTube.** --- ## 強化學習與遊戲強化學習在匹配人類專家表現方面取得了令人難以置信的成果，例如 AI AlphaGo。 **Reinforcement learning has achieved incredible results in matching human expert performance, such as AI AlphaGo.** --- ### 結論與挑戰 **講完人類心理學之後，讓我們進入機器人心理學。** With all that human psychology out of the way, let's go into robot psychology. 希望您將本資料視為一項有趣的挑戰。參加編碼練習，您將可以在一年內存取資料。 **We hope you approach this material as a fun challenge. If you take part in coding exercises, you'll have access to the material for up to a year.** 準備好嘗試了嗎？讓我們開始吧。 **Ready to give it a try? Let's get started.** --- --- </details> <details> <summary>中英原文</summary> Theory 理論 Ready to give it a try? Let's get started.準備好嘗試了嗎？讓我們開始吧。 Hi, my name is Danielle and I'm a content developer here at NVIDIA. I'm looking forward to being your guide during this introduction to deep learning. 大家好，我叫 Danielle，是 NVIDIA 的內容開發人員。我期待在深度學習介紹中成為您的嚮導。 Studying AI could be pretty challenging, but it's also very rewarding. 研究人工智慧可能非常具有挑戰性，但也非常有價值。 Not only can 不僅可以 we learn about how robots think, but we can also pick up the few tricks to help humans be more efficient learners too. 我們了解機器人如何思考，但我們也可以學習一些技巧來幫助人類成為更有效率的學習者。 That's what this first lecture is going to be about. 這就是第一堂課的內容。 Excited. Me too. Let's get going. 興奮的。我也是。我們開始吧。 OK so I have my computer down here and the cameras up there and we're going to go over some technical things before we jump into the theory. So below this video you should hopefully see something like… this, that has the start button up in the top right. So go ahead and click it and that's going to wake up our GPU and when we do that, we can also see a copy of the slides that we'll be going over. 好的，我的電腦在這兒，攝影機在這兒，在我們進入理論之前我們將先討論一些技術問題。因此，在這段影片下面，您應該會看到類似的內容…這個，右上角有一個開始按鈕。因此，繼續單擊它，這將喚醒我們的 GPU，當我們這樣做時，我們還可以看到我們將要查看的幻燈片的副本。 And what's really cool about this, is down here you can expand it by pressing this 最酷的一點是，您可以按此鍵來擴展它 ‘Open a new window’. 「開啟一個新視窗」。 And in the bottom right here… 在右下角… You can also click on the notes. So, this is the first slide. It doesn't have any notes. 您也可以按一下註釋。這是第一張投影片。它沒有任何註釋。 But as we go through it, 但當我們經歷它時， You can see a transcript of what we'll be talking about, and I also like to sneak in little extra bits of trivia in here and some links, so hopefully that helps. 您可以看到我們將要討論的內容的文字記錄，我也喜歡在這裡添加一些額外的瑣事和一些鏈接，所以希望這會有所幫助。 Alright, so when the GPU wakes up, it's going to look something like… 好吧，當 GPU 喚醒時，它會看起來像… This. 這。 Now, when I click on this launch task button… 現在，當我單擊此啟動任務按鈕時... It's going to take me to the Jupiter notebook, so there's a couple more things I'd like 它將帶我到 Jupiter 筆記本，所以還有一些我想要的東西 to share with you. 與您分享。 On the left here, we have our directory of files. Go ahead and read 左邊是我們的檔案目錄。繼續閱讀 them in alphabetical order. 它們按字母順序排列。 When you're done, I highly, highly recommend saving your work 完成後，我強烈建議保存您的工作 by right clicking it and going to download. In the future, if you want to be able to reference that downloaded notebook, we can go ahead and upload it again by clicking this up arrow… 右鍵單擊它並下載。將來，如果您希望能夠引用下載的筆記本，我們可以透過點擊此向上箭頭再次上傳... And uploading it. 並上傳它。 Now these notebooks are pretty small. Hopefully you won't run into any memory issues, but if you do, we can click here, the circle, and we can see the kernels that are running for each 現在這些筆記本都非常小。希望您不會遇到任何內存問題，但如果您遇到了，我們可以單擊此處的圓圈，我們可以看到每個正在運行的內核 of these notebooks. 這些筆記本。 Each notebook is a bit of a memory hog, so if we shut that down, that'll release the memory to the rest of the GPU. 每個筆記型電腦都有點佔用內存，所以如果我們關閉它，就會將內存釋放給 GPU 的其餘部分。 This would probably be a good time to go over the goals of this course. 這可能是回顧本課程目標的好時機。 Artificial intelligence and machine learning? They're huge fields. They can take years to master, so this course is kind of like a a taster, a teaser of what’s to come. I like to think of it as a tutorial to an 人工智慧和機器學習？它們是巨大的田地。他們可能需要數年時間才能掌握，所以這門課程有點像一個品嚐者，一個對即將發生的事情的預告。我喜歡將其視為一個教程 open world game. 開放世界遊戲。 We might not be able to go through all the mechanics within the game, but it's enough to get 我們可能無法了解遊戲中的所有機制，但這足以了解 us started and have some fun. 我們開始玩得很開心。 If you're not a gamer and you're more of a gym person, I like to think of it as going to the gym for the first time in a long time. It might be challenging, and we might be a little sore tomorrow, but we'll have a better understanding of ourselves. 如果您不是遊戲玩家，而是喜歡去健身房，我喜歡將其視為很長一段時間以來第一次去健身房。這可能很有挑戰性，明天我們可能會有點酸痛，但我們會對自己有更好的了解。 It will also help us get a better idea of what we want to work on, whether it's our arm muscles or our leg muscles, or in this case, our math and computer science muscles. OK, with that out of the way, I think it's time to jump into the theory. So, I'm going to jump off camera. OK good luck. 它還將幫助我們更好地了解我們想要鍛鍊什麼，無論是我們的手臂肌肉還是腿部肌肉，或者在這種情況下，我們的數學和電腦科學肌肉。好吧，既然這樣，我想是時候開始討論理論了。所以，我要從攝影機跳下來。好的，祝你好運。 So, we hope you approached this material like a fun challenge. If you take part in the coding exercises, you'll have access to the material for up to a year. So, it's almost like you have infinite lives. 因此，我們希望您將本資料視為一項有趣的挑戰。如果您參加編碼練習，您將可以在長達一年的時間內存取這些資料。所以，這幾乎就像你有無限的生命。 The technical term for learning while having fun is called relaxed alertness. Our brains effectively have a gate between when we're executing versus when we're learning. Much like how neural nets and machine learning models have training and prediction phases. 寓教於樂的專業用語稱為放鬆警覺。我們的大腦在執行和學習之間有效地設有一扇門。就像神經網路和機器學習模型如何具有訓練和預測階段。 So with all that human psychology out of the way, let's go into robot psychology. 講完人類心理學之後，讓我們進入機器人心理學。 We'll start with the prologue to our story. How did we get to where we are today? 我們將從故事的序言開始。我們是如何走到今天這一步的？ Humans have been trying to teach computers to perform tasks since their invention. 自從電腦發明以來，人類一直在嘗試教導電腦執行任務。 Early in the days in computers, many people were convinced that they would achieve human level intelligence within a few decades. 在電腦出現的早期，許多人堅信他們將在幾十年內達到人類水平的智慧。 It turned out that generalized human intelligence was beyond the grasp of computers at the time. 事實證明，當時的電腦無法掌握廣義的人類智慧。 During World War 2, many scientists and engineers tried to find a standard for building a computer. They played with the idea of neural networks, but they lost out to the reliability and efficiency of the Von Neumann Architecture. 第二次世界大戰期間，許多科學家和工程師試圖找到建造電腦的標準。他們嘗試了神經網路的想法，但他們輸給了馮諾依曼架構的可靠性和效率。 I'm not going to get into the Von Neumann Architecture because that's the lengthy and fascinating topic tangential to machine learning, so I'll leave some leave some references in the notes instead. 我不打算討論馮諾依曼架構，因為這是一個與機器學習無關的冗長而有趣的話題，所以我會在筆記中留下一些參考資料。 Long story short, is the basis for modern day computer processing units. 長話短說，是現代電腦處理單元的基礎。 As of recording this, chess is popular because of a show called Queen's Gambit, so I'll share a chess 截至錄製本文時，國際象棋因一個名為“皇后棄兵”的節目而流行，所以我將分享一個國際象棋 story about neural networks. 關於神經網路的故事。 The early 90s was an exciting time for AI because it was the first time a computer was able to defeat a reigning world champion of chess. The chess machine, Deep Blue, used mostly brute force calculation, with many hard coded rules that specialists and programmers collaborated on to limit the board positions Deep Blue computed when looking ahead. 90 年代初期對人工智慧來說是一個激動人心的時期，因為這是電腦第一次能夠擊敗國際象棋衛冕世界冠軍。西洋棋機器「深藍」主要使用暴力計算，專家和程式設計師合作制定了許多硬編碼規則，以限制「深藍」在預測時計算的棋盤位置。 After this momentous occasion, the AI and Game world asked what's next? 在這重要時刻之後，人工智慧和遊戲界問接下來會發生什麼？ So they turned their eyes to backgammon. Backgammon is similar to chess, in that it's a two player turn based game. But it differed in that many pieces can be moved in a turn and it involved probability with rolling dice. The brute force approach with deep blue had difficulties dealing with games of chance, so neural networks were explored. 於是他們把目光轉向了雙陸棋。雙陸棋與西洋棋類似，是兩人回合製遊戲。但它的不同之處在於，許多棋子可以輪流移動，並且涉及擲骰子的機率。深藍色的強力方法在處理機會遊戲時遇到困難，因此對神經網路進行了探索。 The neural network did surprisingly well, becoming a top player and even helping to discover new strategies. But because it wasn't the best in the world, people still saw neural networks as toys. 神經網路的表現出乎意料地好，成為頂級玩家，甚至幫助發現新策略。但由於它不是世界上最好的，人們仍然將神經網路視為玩具。 Deep Blue is an example of an expert system. These systems are meant to mimic a human subject matter expert by hard coding many rules and took hundreds to thousands of engineers to make. Up until recently, this was the dominant strategy of automated decision making. 深藍是專家系統的一個例子。這些系統旨在透過對許多規則進行硬編碼來模仿人類主題專家，並需要數百至數千名工程師來製作。直到最近，這還是自動化決策的主導策略。 However, these systems have limitations. 然而，這些系統有其限制。 Most importantly, rules for these systems had to be understood, defined, and programmed by humans. 最重要的是，這些系統的規則必須由人類理解、定義和編程。 The computer could only do as much as a human engineer was able to understand and turn directly into code. 電腦只能做人類工程師能夠理解並直接轉換為程式碼的事情。 For some well-defined tasks this was fine, but for some tasks that are simple to humans it was extremely difficult to build rules that a computer could follow. 對於一些明確定義的任務來說這很好，但對於一些對人類來說很簡單的任務，建立電腦可以遵循的規則極其困難。 As an example, for each of these pictures, think of one word that describes them. 例如，對於每張圖片，想出一個詞來描述它們。 For the picture on the left, you can describe it as ocean, wave, or blue. 對於左邊的圖片，你可以將其描述為海洋、波浪或藍色。 For the right, we can call it taxi or car or headlight, and for the center we could say it's cat or kitty or cute. 對於右側，我們可以稱之為計程車、汽車或車頭燈，對於中心，我們可以稱之為貓、小貓或可愛。 For most adults, this probably isn't so tricky. 對於大多數成年人來說，這可能不那麼棘手。 But how would we teach a newborn child, let alone an artificial intelligence? How to do this? 但我們如何教導一個新生兒，更不用說人工智慧了？如何做到這一點？ With children, we expose them to lots of data and help them by supplying the correct answer. A very popular learning technique is trial and error, where students make a prediction on what the correct answer is and if they're wrong, adjust their approach. It turns out that deep learning is very much like this. 對於孩子，我們向他們提供大量數據，並透過提供正確答案來幫助他們。一種非常流行的學習技巧是反覆試驗，學生可以預測正確答案是什麼，如果錯誤，則調整他們的方法。事實證明深度學習很像。 So if neural networks weren’t used for a long time, what changed? 那麼如果神經網路長期沒有使用的話，會發生什麼變化呢？ The first is data. In order for our neural network to learn, it must be exposed to lots of different data. 首先是數據。為了讓我們的神經網路能夠學習，它必須接觸大量不同的數據。 For neural network to learn what a cat is it needs to see many pictures of cats as well as pictures with no cats. 為了讓神經網路了解什麼是貓，它需要看到許多有貓的圖片以及沒有貓的圖片。 Thanks to the Internet, there's no lots of data to learn from. 感謝互聯網，沒有太多數據可供學習。 The other major factor is computing power. 另一個主要因素是運算能力。 If we're effectively training an artificial brain, let's think about our own for a second. 如果我們正在有效地訓練人造大腦，那麼讓我們考慮一下我們自己的大腦。 How many gigabytes of information do we observe in a given second? 我們在給定的一秒內觀察到多少 GB 的資訊？ What's the frame rate of life? What is the resolution of the human eye? While modern AI is impressive, we humans process significantly more data in a second than an AI does. 生活的幀率是多少？人眼的分辨率是多少？雖然現代人工智慧令人印象深刻，但我們人類在一秒鐘內處理的數據比人工智慧多得多。 One way machines can try to catch up to us humans is with higher computing power. 機器試圖趕上人類的一種方式是擁有更高的運算能力。 It turns out that under the hood, neural networks are matrix multiplication machines. What other types of problems require many matrix multiplications? Computer graphics! With computer graphics objects are represented as many tiny triangles working together to give the illusion of something more. 事實證明，神經網路在本質上是矩陣乘法機。還有哪些類型的問題需要多次矩陣乘法？電腦圖形學！在電腦圖形學中，物件被表示為許多微小的三角形，它們共同作用，給人一種更多東西的錯覺。 With animation, simulation, and gaming, these tiny triangles are often transformed and rotated, requiring matrix manipulation. 在動畫、模擬和遊戲中，這些微小的三角形經常被變換和旋轉，需要矩陣運算。 Since neural networks and graphics are both built on top of the same fundamental mathematical problem, 由於神經網路和圖形都建立在相同的基本數學問題之上， it's no surprise that the hardware for one can effectively be used for the other. 毫不奇怪，一個硬體可以有效地用於另一個。 An especially important invention in tackling the training of neural networks was the parallel processor. The math required to train a neural network is not especially complex, but calculations must be performed millions, billions, or even trillions of times. 處理神經網路訓練的一項特別重要的發明是並行處理器。訓練神經網路所需的數學並不是特別複雜，但計算必須執行數百萬、數十億甚至數萬億次。 While a CPU might have 10s of cores, a GPU can have hundreds to thousands, allowing for faster parallel processing of this task. CPU 可能有 10 個核心，而 GPU 可以有數百到數千個核心，從而可以更快地並行處理該任務。 So what is deep learning and 那什麼是深度學習和 what makes it special? 它有何特別之處？ Deep learning flips traditional programming around. 深度學習顛覆了傳統程式設計。 For example, normally to build a classifier, we come up with the set 例如，通常要建立一個分類器，我們會提出集合 of rules and 的規則和 program those rules. 對這些規則進行程式設計。 Then when we want to classify something, we give our program a piece of data and the rules are used to select a category. 然後，當我們想要對某些東西進行分類時，我們給我們的程式一段數據，然後使用規則來選擇類別。 With deep learning, you first need a list of variables, inputs and their corresponding outputs. 對於深度學習，您首先需要一個變數、輸入及其對應輸出的清單。 We don't need to know the relationship between the inputs and the outputs, but the more accurate we can observe their values, the better. 我們不需要知道輸入和輸出之間的關係，但我們觀察它們的值越準確越好。 We'll call list of examples the dataset. 我們將範例清單稱為資料集。 We then train the model by showing it the dataset and the correct outputs. 然後，我們透過向模型顯示資料集和正確的輸出來訓練模型。 The model keeps taking guesses and finds out if it 該模型不斷猜測並找出是否正確 was right or not. 正確與否。 It slowly learns to correctly categorize over the course of training. 它在訓練過程中慢慢學會正確分類。 The fundamental difference is that instead of humans needing to identify and program the rules, the deep learning algorithm will learn them on its own. 根本區別在於，深度學習演算法將自行學習規則，而不是人類需要識別和編程規則。 This is a fundamental shift in the way we build software systems. 這是我們建構軟體系統方式的根本轉變。 Deep learning is not always the right choice. 深度學習並不總是正確的選擇。 If the rules 如果規則 are clear and straightforward. It's often better to just program it. 清晰明了。通常最好只是對其進行編程。 However, if the rules are nuanced or complex, and humans would have a hard time describing them, let alone programming them, deep learning is a great choice. 然而，如果規則微妙或複雜，人類很難描述它們，更不用說對其進行編程，那麼深度學習是一個不錯的選擇。 One thing that separates deep learning from traditional machine learning is the depth and complexity of the networks that are used for especially complicated tasks such as natural language understanding, networks can have billions of parameters. 深度學習與傳統機器學習的區別之一是網路的深度和複雜性，用於自然語言理解等特別複雜的任務，網路可以擁有數十億個參數。 The ‘deep’ in 裡面的‘深處’ deep learning refers to many layers of the model that learn the important rules necessary for completing a task. 深度學習是指模型的許多層，它們學習完成任務所需的重要規則。 These are often called hidden layers. 這些通常稱為隱藏層。 Deep learning has only been a major factor in the last decade or so, but it has had a huge impact in many industries. 深度學習在過去十年左右才成為一個主要因素，但它已經對許多行業產生了巨大影響。 The field of computer vision has been around long before the field of deep learning, but because there are now more and more tagged high-quality images available, deep learning is having a large influence. 電腦視覺領域早在深度學習領域出現之前就已存在，但由於現在有越來越多的標籤的高品質影像，深度學習正在產生很大的影響。 The goal of computer vision is to teach machines to see and perceive images the way that humans do. Thanks to the complexity of our eyes, not to mention the processing our brains do with these images. This is no easy task. 電腦視覺的目標是教導機器像人類一樣觀察和感知圖像。由於我們眼睛的複雜性，更不用說我們的大腦對這些圖像的處理。這不是一件容易的事。 Natural language processing has 自然語言處理有 helped make huge shifts. For example, on highly accurate real time language translation, as well as voice recognition, and virtual assistance. 幫助實現了巨大轉變。例如，高度準確的即時語言翻譯，以及語音辨識和虛擬協助。 Recommender systems based on deep learning fuel the Internet with content curated feeds such as Facebook, music like Spotify, video like YouTube and Netflix, as well as targeted online advertising and shopping recommendations such as Amazon. 基於深度學習的推薦系統透過 Facebook 等精選內容、Spotify 等音樂、YouTube 和 Netflix 等影片以及 Amazon 等有針對性的線上廣告和購物推薦為網路提供動力。 Reinforcement learning has achieved incredible results in matching human expert performance, such as 強化學習在匹配人類專家表現方面取得了令人難以置信的成果，例如 the AI AlphaGo, 人工智慧阿爾法狗， which beat the world champion in what is considered one of the most difficult and complex games in the world, and AI bots are now also taking on world experts in complex video games such as StarCraft and Dota. 在被認為是世界上最困難和最複雜的遊戲之一中擊敗了世界冠軍，人工智慧機器人現在也在複雜的視頻遊戲（例如《星海爭霸》和《Dota》）中與世界專家較量。 Before we get too much into the theory and specifics of deep learning, we're going to act like an AI and play with the neural network right away. See if you could figure out how we went about constructing the model. Feel free to experiment and alter the code. 在我們深入了解深度學習的理論和細節之前，我們將像人工智慧一樣立即使用神經網路。看看您是否能弄清楚我們是如何建立模型的。請隨意嘗試和更改程式碼。 You can always restart the GPU instance in order to get a fresh start 您可以隨時重新啟動 GPU 執行個體以重新開始 if your curiosity gets a little too crazy. 如果你的好奇心變得有點太瘋狂了。 Ready to give it a try? Let's get started. 準備好嘗試了嗎？讓我們開始吧。字幕结尾。跳转至开始。字幕結尾。跳轉至開始。 Please click the start button to the top-right. The course will take about 5 minutes to load. 請點選右上角的開始按鈕。加載課程大約需要 5 分鐘。 </details> ## 2. The Theory Behind Neural Networks 神經網路背後的理論 <details> <summary>摘要</summary> --- ## Theory 理論 --- ### 進入深度學習的基礎 **0:22 / 16:01** 請點擊此按鈕以對該視頻靜音／取消靜音，或者使用鍵盤的上下方向鍵增大或減小音量。 **Click this button to mute/unmute the video, or use the up and down arrow keys on your keyboard to increase or decrease the volume.** 最大數值音量. **Max volume.** --- ## 模型概述與數據處理流程 --- ### Step-by-Step 模型建立步驟 OK, 那麼剛才發生了什麼事？讓我們一步一步地看一下。 **OK, so what just happened? Let's go over it step by step.** 我們加載並可視化了數據，然後將數據編輯成正確的形狀，並將數值標準化到 0 到 1 之間。我們還將答案編輯為分類形式，而不是數字。 **We loaded and visualized our data, edited it to get the right shape, and normalized the values between zero and one. We also edited our answers to be in a categorical form instead of a number.** ### 模型架構細節我們的數據每個是 28 x 28 像素影像，作為 784 個整數的陣列。 **Each 28 by 28 pixel image is received as an array of 784 integers.** --- ## 網路結構與神經元互聯 --- ### 神經網路的結構描述 - 784 個輸入層神經元 - 兩層隱藏層，每層包含 512 個神經元 - 10 個輸出神經元（對應數字0至9）每層中的每個神經元都連接到下一層中的每個神經元。 **Each neuron in each layer is connected to every neuron in the next layer.** --- ## 深度學習中的數學基礎 --- ### 迴歸與最小平方誤差迴歸是指使用輸入來預測連續輸出。 **Regression is when we use an input to predict a continuous output.** 我們的目標是找到一條可以經過這些點的線。 **Our goal is to find a line that could pass through these points.** 傳統迴歸線建立在最小平方誤差的概念上。 **Traditional regression lines are built on the concept of least squared error.** ### 機器學習中的梯度下降法這個過程叫做梯度下降法。 **This process to find the minimum loss is called gradient descent.** --- ## 激活函數的重要性 --- ### 常用的激活函數在此，我們介紹兩種激活函數： - **ReLU（修正線性單元）：** 將負輸出設為0 - **Sigmoid：** 將數值壓縮為0至1之間，適合機率估算 --- ## 深度學習中的過度擬合問題 --- ### 過度擬合與模型驗證如果模型在訓練數據上的表現良好，但在新數據上表現不佳，我們稱之為過度擬合。 **If the model performs well on training data but not on new data, we call this overfitting.** 為了避免過度擬合，我們使用訓練數據和驗證數據。 **To avoid overfitting, we use both training and validation data.** --- ## 核心損失函數：交叉熵 **交叉熵（Cross Entropy）：** 我們使用交叉熵來量化模型預測的準確性。如果模型預測錯誤，將得到很高的損失。 **We use cross entropy to measure the accuracy of model predictions. If the model is wrong, it receives a high loss.** 這是二分類交叉熵的完整數學公式。 **Here's the full mathematical formula for binary cross entropy.** --- ## 準備好進入實驗 **即將進行新的數據集實驗：美國手語字母** **New dataset experiment: American Sign Language alphabet.** 我們將跳過 J 和 Z，因為這些字母涉及手部移動。 **We will skip J and Z, as these letters require movement.** 感到對神經網路更有信心了嗎？讓我們開始吧！ **Feeling more confident about neural networks? Let's get to it.** --- </details> <details> <summary>中英原文</summary> 跳至主要内容 Theory 理論 0:22 / 16:01 OK, so what just happened? Let's go over it step by step. 好吧，那麼剛才發生了什麼事？讓我們一步一步地看一下。 We loaded and visualized our data… 我們加載並可視化我們的數據...... We edited our data to get into the right shape as well as normalize the values between zero and one. We also edited our answers to be in a categorical form instead of a number, as our model will better train with that type of answer. 我們編輯了數據以形成正確的形狀，並將數值標準化為 0 到 1 之間。我們還將答案編輯為分類形式而不是數字，因為我們的模型將更好地使用這種類型的答案進行訓練。 We then created our model, 然後我們創建了我們的模型， compiled our model and trained our model on the data. 編譯我們的模型並根據資料訓練我們的模型。 Let's start with our data. We received each 28 by 28 pixel image as an array of 784 integers. 讓我們從我們的數據開始。我們收到每個 28 x 28 像素影像作為 784 個整數的陣列。 We then normalize that data by dividing it by 255, the max value for a pixel. 然後，我們將資料除以 255（像素的最大值）來標準化該資料。 Let's talk about our model. We can't fully represent it in a slide because it has many connections. 讓我們談談我們的模型。我們無法在一張幻燈片中完整地展示它，因為它有很多聯繫。 Remember that there are 784 inputs, 2 layers of 512 neurons and an output layer of 10 neurons, one for each digit. 請記住，有 784 個輸入、2 個由 512 個神經元組成的層和一個由 10 個神經元組成的輸出層，每個神經元對應一個數字。 Every neuron in each layer is connected to every neuron in the next layer. 每層中的每個神經元都連接到下一層中的每個神經元。 With this we have a lot of connections. 有了這個，我們就有了很多連結。 784 times 512. 784 乘以 512。 Plus 512 times another 512. 加 512 乘以另一個 512。 Plus 512 times 10. 加 512 乘以 10。 Looking at this, can you guess what the approximate magnitude is? 看到這個，你能猜出大概的大小是多少嗎？ We have 668,672 connections. 我們有 668,672 個連接。 In order to understand how this network came about, let's look at a simpler model. 為了理解這個網路是如何產生的，讓我們來看一個更簡單的模型。 Each of the circles on the slide before is a neuron. 前面幻燈片上的每個圓圈都是一個神經元。 The earliest use case of this neuron was to build a regression line. Good old Y equals MX plus B. 這個神經元最早的用例是建立迴歸線。好的舊 Y 等於 MX 加 B。 Regression is when we use an input to predict a continuous output, like using how many seconds of pot of water has been sitting on a stove to predict the temperature 迴歸是指我們使用輸入來預測連續輸出，例如使用一壺水在爐子上放置了多少秒來預測溫度 of the water. 的水。 For each variable coming into our neuron, we're going to find a 對於進入神經元的每個變量，我們將找到一個 slope or M. 斜率或 M。 We will also find our Y intercept or B which is a constant value not impacted by the other variables. 我們也會發現 Y 截距或 B，它是一個不受其他變數影響的常數值。 Our goal is to find a line that could pass to these points. That way we can use that line to make predictions. 我們的目標是找到一條可以經過這些點的線。這樣我們就可以使用這條線來進行預測。 Some of you that have taken statistics before probably know of a more deterministic way to get M and B. We can verify through algebra that M is equal to 2 and B is equal to 1. 有些學過統計學的人可能知道有一種更確定的方法來獲得 M 和 B。 We're going to do a more trial and error approach. 我們將採取更多的嘗試和錯誤方法。 We're going to pick random values to start, usually between -1 and 1. 我們將選擇隨機值開始，通常在 -1 和 1 之間。 But we're going to pick 5 for B and -1 for M to make things easier to visualize. 但我們將為 B 選擇 5，為 M 選擇 -1，以使事情更容易視覺化。 The Y hat here, the one with the little hat, is going to be our estimate of the true value, Y. 這裡的 Y 帽子，即帶有小帽子的帽子，將是我們對真實值 Y 的估計。 Most of the time, our guess is going to be a bit off. So how do we correct it? 大多數時候，我們的猜測會有點偏差。那我們該如何糾正呢？ Traditional regression lines are built on the concept of least squared error, meaning we're going to take the difference of each estimate from the true value and square it. This keeps the error positive, so when we figure out our total error, they don't cancel each other out. It also strongly punishes bad guesses. 傳統的迴歸線建立在最小平方法誤差的概念之上，這意味著我們將計算每個估計值與真實值的差異並對其進行平方。這會使誤差保持為正，因此當我們計算出總誤差時，它們不會相互抵消。它還會嚴厲懲罰錯誤的猜測。 If it's been a while since seeing some of these math symbols, no problem, we can code our root mean squared error or RMSE like this. 如果自從看到這些數學符號以來已經有一段時間了，沒問題，我們可以像這樣編碼我們的均方根誤差或 RMSE。 For each of our points, we'll figure out what the difference is between Y and MX plus B. 對於每個點，我們將找出 Y 和 MX 加 B 之間的差異。 Then we will square that difference and then take the average of all the squared differences. 然後我們對該差值進行平方，然後取所有平方差的平均值。 OK, phew, that was mathiest 好吧，唷，這是最數學的 we’re going to get today. 我們今天就會得到。 Let's graph out our M and B in relation to our mean squared error. On the left we have a surface plot of our loss function. 讓我們繪製出 M 和 B 與均方誤差的關係。左邊是損失函數的曲面圖。 That's the error function that we choose to evaluate our model, which in our case is the mean squared error. 這就是我們選擇用來評估模型的誤差函數，在我們的例子中是均方誤差。 It's a little hard to read, so on the right we've plotted the contours. 它有點難以閱讀，所以我們在右側繪製了輪廓。 With that, we can see it's actually a kind of parabolic ellipse that bottoms out at B is equal to 1 and 這樣，我們可以看到它實際上是一種拋物線橢圓，在 B 等於 1 時觸底，並且 M is equal to 2. M等於2。 If this is our guest line, we can see where our current position is on the curve. 如果這是我們的賓客線，我們可以看到我們目前的位置在曲線上的位置。 Our goal is to find the minimum error and since it's a toy example, we know that the minimum is 0. 我們的目標是找到最小誤差，由於這是一個玩具範例，我們知道最小值為 0。 For us humans, it's easy to see that the bottom is around, M is equal to 2 and B is equal to 1. 對我們人類來說，很容易看出底部就在附近，M等於2，B等於1。 But for the computer, it's not so easy to see that. 但對於計算機來說，就不太容易看到這一點。 The computer can't visualize the loss curve without some help. Turns out the computer isn't the best at calculus, 如果沒有幫助，計算機無法可視化損失曲線。事實證明計算機不擅長微積分， So we're going to help it cheat. 所以我們要幫助它作弊。 We have two levers to help us move, M and B. 我們有兩個槓桿來幫助我們移動，M 和 B。 We could change be our intercepts or M our slope. Let's start with B. 我們可以改變我們的截距或 M 我們的斜率。我們先從B開始吧。 If we subtract 1, 如果我們減去 1， Cool. Our line hits one of the points now. 涼爽的。我們的路線現在擊中了其中一個點。 But it's still missing the other one. Let's increase the slope by 1. 但它仍然缺少另一件。讓我們將斜率增加 1。 Darn it, now we're missing both points, but the MSE did decrease from 2.5 to 1. 該死的，現在我們錯過了這兩點，但 MSE 確實從 2.5 下降到了 1。 The computer doesn't have our human eyeball ability to guesstimate what it should change in order to get to the bottom of this bowl. 電腦沒有我們人類的眼球能力來猜測它應該改變什麼才能找到這個碗的底部。 So what it's going to do is, given the current position that it's at, it's going to calculate the gradient, which is a fancy way to describe a multivariate slope. 所以它要做的是，給定當前位置，它將計算梯度，這是描述多元斜率的一種奇特方式。 It's going to figure out in which direction is the loss decreasing the most, which is going to be some ratio 它將找出哪個方向的損失減少最多，這將是某個比率 of B to M. 從 B 到 M。 Once we have a direction, we need to figure out how far to 一旦有了方向，我們就需要弄清楚要走多遠 travel with it. 帶著它去旅行。 The gradient is going to change depending on our current position, so we don't want to make too big of a step because we can accidentally move farther away from our target. 梯度將根據我們目前的位置而變化，因此我們不想邁出太大的一步，因為我們可能會意外地遠離目標。 We also don't want to be too small because that means it can take a long time to get to our goal and life is too short to wait for our model to finish training. 我們也不希望規模太小，因為這意味著需要很長時間才能達到我們的目標，而等待我們的模型完成訓練的時間太短了。 The amount we travel is something we decide and it's in a parameter called the learning rate. Now that we're in a new position, we repeat the process, find the new gradient, and take another step. There's a few ways to measure our progress. There are epochs, which is another way to say 1 pass through all the data. It used to be that people calculate the gradient using the full data set, but nowadays people see that it is more efficient to use a sample or batch from the dataset. In either case, a step is we take either a full data set or a batch, calculate the gradient and update our parameters with the gradient and learning rate. After a few steps, this is what our journey will look like. This process to find the minimum loss is called gradient descent. Thankfully, a lot of research has been done on the best way to define the learning rate and machine learning frameworks have a few tools that will automatically adjust the learning rate for us. For instance, a popular one is Adam or adaptive momentum, which is kind of like thinking of our loss curve as a mountain and our position as a marble. If we drop a marble on top of a mountain, it will pick up speed, jumping over trenches before hopefully landing on a lower minimum. Now that we have the core concepts down, we could start building up our network. Instead of having a single X input with the slope M, we're going to have multiple X inputs, X1 X2 X3 and so on, and find each of them a weight. Our previous math stays mostly the same. We just need to find the gradient using the new variables like we did before. We can also take the output of one neuron and feed it into another, and as long as we don't make a loop, we can connect the same output to multiple inputs. Badum! With this, we officially have a deep learning network. When we calculate our new weights from gradient descent, we can use the error calculated in a later neuron as part of the error for the previous neuron it's connected to. Most frameworks automate this part for us, so we're not going to go too much into the math. But I'll leave some links in the notes that explains how it works. There is one mathematical gotcha we should keep an eye out for, and that's our activation function. Right now, we're just doing a linear sum, so we get a line in 2D or plain in 3D. This is a bit of a waste on the neural network, since the only output we'll get from adding up blinds together is another line, and it would be nice to be able to capture more complex shapes. Here's the strategy. Computers, like equations of a line because they're quick to compute and it's easy for us to give it the rules on how. To differentiate them. One easy way to add nonlinearity is to feed our equation of a line into another nonlinear function. One of the most popular ones is ReLU, the rectified linear unit. This is just a fancy way of saying whenever the output of our line is negative, we're going to set it to 0. Because the output of our neuron gets assigned a weight if it's fed into another neuron, there's a chance the output will become negative. Another important activation function is sigmoid and manipulates our line so we get this S shape that goes from zero to 1. This is great because then we can use our neurons to assign probabilities to predictions. All of these graphs start with the linear plane that gets squeezed through an activation function. In actuality, we're likely to have more than one input into our neuron, and this is the shape these activation functions give our neurons in three-dimensional space. Linear is just a plane like a sheet of paper. ReLU is like a wedge, and sigmoid is like a surf wave. Here's how an understanding of all this math is going to help us. Let's say we have a one input, one output data set, and we suspect it’s the summation of two waves. We can use a sinusoidal activation function in our first layer and a linear activation in our last layer so we can add the two. Given enough data, our neural network will figure out the parameters of each of these subcomponents for us. Having a general understanding of the shape of the data and the relationship between the variables can help us blow more efficient model for our data, saving time, computation, and ultimately, money. If we can make these complicated shapes with more and more layers and neurons. Why not make a super big neural network to solve every problem? Can you think of any of the pitfalls that can happen from having too big of a model? We could be wasting a lot of computational resources, and it could also take longer to train, but there is another problem we should be aware of. The problem goes back to classical statistics, but still plagues neural networks. Let's say we've gone out, collected some data, and created 2 models. Which one of these is better? Going by the root mean squared error, it's the one on the left. But, what if I was to go out and get a new sample of data? It turns out our simple model does a better job generalizing to new data. Not all problems can be so simple. It is the job of the data scientist to be able to determine the correct complexity for the problem. Remember when we had two separate pools of data, training and validation, so that we can be sure that network is not just memorizing the data that it's learning on? We need to make sure that when it sees new and different images, that it is accurate as well. If the model performs well on training data but not on new data, we call this overfitting. We just went over many parts of a neural network. Let's see how it applies to the last lab. This was the visualization of the model used in the previous lab. Can you name some of the components that we just learned on this diagram? Any details missing? Here's a more detailed version of the diagram now, including activation functions. We use ReLU in our hidden layer to add efficient nonlinearity. But why use sigmoid at the end? This way we can get a probability for each digit. But wouldn't it be useful if all the probabilities for all ten of our digits added up to 100%? How can we do that? Not a problem. There's a function many machine learning libraries have called softmax. That’s going to add the results of all the sigmoid functions across the layer and then reweigh them so that they all add up to 100%. Let's say we have a group of points that are categorized by color. How will we use a root mean squared error line to predict whether a new data point is going to be red or blue? This time, we want to find a line that best separates the data. The error function that we use to calculate our gradient descent is called the loss function or cost function. We could try to assign numbers to these colors and use write mean squared error, but there is a better way. The most popular way to approach this problem is with categorical cross entropy. Before we dive into the math, let's talk about the logic. We're going to assign a probability to each point, whether it's red or blue. We could almost take this probability as our confidence it is a particular class. Cross entropy was designed like this. If we are 100% confident that a point is blue and we're right perfect, we have zero loss. However, if we're wrong, we should be punished for our hubris and receive infinite loss. This works in the reverse case as well. If we assign a point a 0% chance of blue. That is, if we're absolutely certain it is red and it is not blue, we get no loss. However, if it is blue, we'll have infinite loss. Here's the full mathematical formula for binary cross entropy. Meaning we only have two classes. The logarithms are used to get our near infinite loss. It might seem like a lot, but there's a little trick. The left half cancels to 0 when the target is false and the right half cancels to 0 when the target is true. Here's the code version for all you programmers out there. This is the same math as the slide before, but in code form. Y hat is our prediction that a point is blue and Y actual is whether a point was actually blue or not. The logarithms are what will push our code to Infinity as log 0 is negative Infinity. OK, now that we know the major mechanics, let's talk about the next lab. We're going to try this again with a different data set. The American Sign Language alphabet. We will have the same shape as last time, 28 by 28 pixels. We're going to be skipping J and Z because they require movement. Feeling more confident about neural networks? Let's get to it. 字幕结尾。跳转至开始。 Please click the start button to the top-right. The course will take about 5 minutes to load. </details> ## 3. Convolutional Neural Networks 卷積神經網絡 <details> <summary>摘要</summary> --- ## 理論介紹：卷積神經網路 ### 理論概述卷積神經網絡是一種允許機器更接近人類方式感知圖像的技術。 **Convolutional Neural Networks (CNNs) allow machines to perceive images more closely to the way humans do.** --- ### 傳統的電腦視覺基礎在神經網路出現之前，電腦視覺專家使用內核來處理圖像中的特徵，這與我們在 Photoshop 等繪圖軟件中的工具操作方式相似。 **Before neural networks, computer vision experts used kernels to manipulate image features, much like how Photoshop tools operate.** --- ## 卷積 (Convolution) 與內核 (Kernel) ### 卷積的基本概念卷積是將內核（一個小矩陣）應用於圖像的一部分，然後將結果累加起來。 **Convolution applies a kernel (small matrix) to a part of an image, then sums the results.** - **模糊 (Blur)**：使用內核中的加權平均值來柔化圖像。 - **銳利化 (Sharpen)**：強調中心像素並減去相鄰像素。 - **變亮 (Brighten)**：增加中心像素的亮度。 --- ## 卷積參數與效果 ### 視窗、步幅 (Stride) 與填充 (Padding) - **步幅 (Stride)**：指移動內核時的像素數量。步幅越大，產生的數據越少，但可能會遺漏重要信息。 - **填充 (Padding)**：添加額外的邊框（如零填充）來確保所有像素參與卷積計算。 --- ## 卷積神經網路的工作原理 ### 從內核到神經網路的聯結卷積神經網路的特徵在於它們將神經元的輸入映射到具有可訓練權重的內核上。 **CNNs map neuron inputs to kernels with trainable weights, similar to how neurons process information.** - 多層卷積可以提取圖像中的不同層次特徵，逐層構建直至完整的物體識別。 --- ## 卷積神經網路中的特徵提取 ### 邊緣檢測與顏色梯度特定內核可以檢測圖像中的邊緣和顏色變化方向，使得卷積網路可以識別圖案和形狀。 **Specific kernels can detect edges and color gradients, enabling CNNs to recognize patterns and shapes.** --- ## 增強網絡性能的技術 ### 池化 (Pooling) - **最大池化 (Max Pooling)**：在池化窗口內取最大值，減少圖像大小，從而減少計算量。 ### Dropout 隨機關閉部分神經元以防止過度擬合。 **Randomly shuts off neurons to prevent overfitting.** --- ### 批量標準化 (Batch Normalization) 在訓練期間標準化層間權重變化，防止內部協變量偏移。 **Normalizes weight changes between layers during training to prevent internal covariate shift.** --- ## 準備開始實驗：實作卷積神經網路準備好動手了嗎？我們走吧。 **Ready to get your hands dirty? Let's go.** --- </details> <details> <summary>中英原文</summary> Theory 理論 All right. So let's talk about the previous lab. We made a dense neural network, and it did OK… 9 out 10 accuracy. Not bad for a machine, but not as good as a human. We also saw that the training accuracy was pretty high, but the validation accuracy was low, which means it probably was overfitting. So today we're going to do something 好的。那我們來談談之前的實驗室。我們製作了一個密集的神經網絡，效果還不錯……十分之九的準確率。以機器來說還不錯，但不如人類。我們也看到訓練精度相當高，但驗證精度很低，這意味著它可能過度擬合。所以今天我們要做點什麼 called convolutional neural networks, which is a fun technique which allows our machines to perceive images much more closely to the way that humans do. 稱為卷積神經網絡，這是一種有趣的技術，它使我們的機器能夠以更接近人類的方式感知影像。 Sounds pretty cool, right? OK, let's get going. 聽起來很酷，對吧？好的，我們開始吧。 Before we get into neural networks, let's talk about how computer vision experts use to approach this type of problem. If you played with Photoshop, MS Paint or GIMP a lot as a kid making gifs or gifs or memes or silly pictures, good news, understanding how computers represent and manipulate photos is a great foundation for modern day computer vision. 在我們討論神經網路之前，我們先來談談電腦視覺專家如何用來解決這類問題。如果您小時候經常使用 Photoshop、MS Paint 或 GIMP 製作 gif、gif、meme 或愚蠢的圖片，那麼好消息是，了解電腦如何表示和操作照片是現代電腦視覺的重要基礎。 For example, let's take a look at the blur, sharpen, brighten and darken tools. In drawing programs, the area around where we click that manipulates our image is called the kernel. 例如，讓我們來看看模糊、銳利化、增亮和變暗工具。在繪圖程式中，我們按一下操作影像的周圍區域稱為核心。 Convolution is when our kernel is multiplied with our image and that is what's happening when we select these tools and drag it across our image. 卷積是指我們的內核與圖像相乘，這就是當我們選擇這些工具並將其拖曳到圖像上時發生的情況。 To be technical, convolution as when we apply a function to another function. But we could think of our base image and our kernel as functions of pixels and color. 從技術角度來說，卷積就像我們將一個函數應用於另一個函數一樣。但我們可以將基礎影像和內核視為像素和顏色的函數。 The kernel is just a small matrix. 內核只是一個小矩陣。 Blurring is a weighted average of the numbers 模糊是數字的加權平均值 that represent the pixels. 代表像素。 Sharpen brightens the pixel in the center of it, but subtracts the pixels directly adjacent to it. 銳利化會使中心的像素變亮，但會減去與其直接相鄰的像素。 Brighten makes the center pixels color values bigger. 變亮使中心像素的顏色值變大。 OK, so we have a picture of a heart. Not obvious. Sorry. I only had 6 by 6 pixels to work with. 好的，我們有一張心形的圖片。不明顯。對不起。我只有 6 x 6 像素可以使用。 We're going to take a step through a convolution. 我們將透過卷積進行一步。 The fancy star in the center is a convolution operator. 中心的精美星星是一個卷積運算符。 The resulting image is going to be 4 by 4 pixels. 生成的圖像將為 4 x 4 像素。 Let's start in the top left corner. We'll call the section that we're about to convolve, the red box here, the window. Notice how it has the same shape as our kernel. 讓我們從左上角開始。我們將要進行卷積的部分（此處的紅色框）稱為視窗。請注意它與我們的內核具有相同的形狀。 We're going to take our kernel and multiply each cell in the kernel with the respective cell in the window. 我們將採用內核並將內核中的每個單元與視窗中的相應單元相乘。 We're then going to total the results of the multiplication. 然後我們將對乘法結果加總。 Next, we move the window over by 1… 接下來，我們將視窗移動 1... And we repeat the process until we've calculated the convolution for the whole image. 我們重複這個過程，直到計算出整個影像的捲積。 There are a few parameters we could play with for convolution. 我們可以使用一些參數來進行卷積。 One is the stride, which is by how many pixels we should move our window when doing convolution. 一是步幅，也就是在進行卷積時我們應該移動視窗多少像素。 Let's chop our heart in half to see what we mean. Oh, no broken heart! 讓我們把心切成兩半來看看我們的意思。哦，沒有破碎的心！ Let's assume we have the same kernel as before. 假設我們有與以前相同的內核。 With a stride of two, we skip starting convolution at every other column and every other row. 跨步為 2，我們跳過每隔一列和每隔一行開始卷積。 When we do this, we also generate less data, which has pros and cons. 當我們這樣做時，我們也會產生更少的數據，這有利有弊。 The good is that means we have less data to analyze. 好處是，這意味著我們需要分析的數據更少。 But if we increase our stride too much, we might miss important information. 但如果我們邁得太大，我們可能會錯過重要的訊息。 The other thing to lookout for is with 另一件需要注意的事情是 a stride of 2… 步幅為 2… We don't have enough columns to do a convolution for the right side of this image. 我們沒有足夠的列來對該圖像的右側進行卷積。 If we want enough data to make sure all pixels are used in convolution, 如果我們需要足夠的數據來確保所有像素都用於卷積， or if we want the resulting image to be the same size 或者如果我們希望生成的圖像大小相同 as our input image, 作為我們的輸入影像， we can do something called padding. 我們可以做一些叫做填充的事情。 A quick and 一個快速且 easy way to do this is called zero padding, where we add a border of zeros around our image. 實現此目的的簡單方法稱為零填充，即我們在圖像周圍添加零邊框。 This can be sufficient in many cases, but in some where the image is small. It could have an impact on the convolution. 這在許多情況下就足夠了，但在某些圖像較小的情況下。它可能會對卷積產生影響。 So… how does this apply to neural networks? 那麼……這如何應用在神經網路呢？ Before neural networks, computer vision researchers would play around with the numbers within a kernel to see if it could be used to find any interesting features in the image. 在神經網路出現之前，電腦視覺研究人員會研究內核中的數字，看看它是否可以用來發現影像中任何有趣的特徵。 So far, we've only used three by three kernels, but they can be any size, and researchers would try to do things like make a kernel that returned a high value if it convolved a cat ear. 到目前為止，我們只使用了三乘三的內核，但它們可以是任何大小，研究人員會嘗試做一些事情，例如製作一個如果與貓耳進行卷積則返回高值的內核。 So you could have a cat ear kernel, a cat nose kernel, a cat eye kernel... 所以你可以有貓耳仁、貓鼻仁、貓眼仁… and if we ran all of those through an image and found matches, you might guess that there was a cat. 如果我們通過圖像運行所有這些並找到匹配項，您可能會猜測有一隻貓。 But as neural networks got more and more powerful people made the connection that… 但隨著神經網路變得越來越強大，人們建立了聯繫… Wow. 哇。 Kernels are these devices that multiply weights to an input and then adds the results together. 內核是將輸入的權重相乘然後將結果相加的設備。 What else does that? 那還有什麼作用呢？ Oh yeah, neurons do that too! 哦，是的，神經元也這樣做！ That's exactly what convolutional neural networks are. They map the inputs of a neuron to a kernel with trainable weights. 這正是卷積神經網路。他們將神經元的輸入映射到具有可訓練權重的內核。 This is a convolutional neural network architecture. 這是一個卷積神經網路架構。 It is a bit smaller than what we'll be doing in the lab, but the structure is similar. 它比我們在實驗室中要做的要小一些，但結構相似。 1st we have our input image. We're going to start with the variable number of convolutional layers, in this case two three by three by one grayscale filters. 首先我們有輸入影像。我們將從可變數量的捲積層開始，在本例中是兩個三乘三乘一的灰階濾波器。 We can have as many as we want, and each one of them is going to produce a new image by trying to extract interesting features out of the original image. 我們可以擁有任意數量的影像，每個影像都會透過嘗試從原始影像中提取有趣的特徵來產生一幅新影像。 We're going to stack these images on top of each other, so we end up with a three-dimensional image, but not 3D in the way that we humans perceive it. 我們將這些圖像堆疊在一起，因此我們最終會得到一個三維圖像，但不是我們人類感知的 3D 圖像。 He could then send this new image through a new set of filters when convolving, even though we're technically doing a 2D convolution because we're in black and white, we're actually convolving a 3D object that is our window size by the number of stacked images. Because we have two stacked images, our kernels are digesting a three by three by two slice of the stacked images. In the picture above, we have two, three by three by two filters. When we're done convolving, we'll flatten everything out and feed the results through under all that as normal. It used to be that data scientists would toy around with the kernel sizes, and that's still useful in some cases, but empirically, sticking with three by three works well. Let's go back to computer vision theory to understand why. When researchers first came up with these kernels before neural networks, they also realized that you can use them to essentially calculate the derivative of these images. What in the world does that mean? Well, there is a special class of kernels that can pick up edges, places where the change in color of an image is changing rapidly. For instance, the kernel on the left picks up vertical edges and the kernel on the right picks up horizontal edges. Notice how the horizontal lines are gone in the left? That's because if there's a horizontal line, there's no change. By combining information from both the horizontal and vertical changes, BAM, we now have a direction of change for each point. Ever thought of where the term ‘color gradient’ comes from? We can now do some cool things with this information, like be able to identify any circles in our image like our marble from earlier, but rather than figure out the fancy math for this ourselves, we'll let the neural network do it. If we're willing to get weird for a second, the way artificial neural networks work is very similar to how our own brains process images. Both our brain and the artificial brain will take the edge detectors and feed the results into the next layer, which will detect intersections between edges and will be good at detecting textures and patterns. Those feed into the next layer and the next layer becoming more and more developed until entire objects can be recognized. Here is the results of using 4 layers of three neurons on each of our ASL data sets. Nothing too fancy here because our images are small, so I also ran our marble through the deep Dream Generator website, which is built using a much larger neural network on plenty of high resolution photos of many types of objects. It will enhance the patterns or objects it's able to detect. Any interpretation here is trying to mind read an artificial intelligence, which is… as unscientific as it sounds, but I could almost make out a cartoon dog. Google's Inception network was originally trained on lots of dog photos, so the tools to visualize the intermediate layers ended up displaying many dreamlike dog photos. We're almost ready for the next lab. There's just a few more tricks to help us out. Pooling is where we look at all the values in our window and do simple statistical computation. The most common one is max pooling, where we take the largest value in the window and discard the rest. This trick is especially useful when working with large images, because it's a way to shrink images down to a smaller image, and smaller images mean less computation, which is quicker and cheaper. Dropout is another interesting tool which, in a way, reflects ensemble learning. During training, we're going to randomly shut off neurons at a rate we specify, meaning it's unable to learn for that step of training. This prevents overfitting by eliminating overreliance on any particular input or neuron. Just be careful not to set the rate to 1. Then, all the neurons will be shut off and the model won't be able to learn. Here's the network we'll be using in the next lab. 這是我們將在下一個實驗室中使用的網路。 We're also granted something called batch normalization when we normalize the amount weights change between layers during training. 當我們在訓練期間標準化層之間的權重變化量時，我們也獲得了稱為批量標準化的功能。 Batch normalization was made to prevent something called internal covariate shift, but turns out it doesn't actually do that. 批量標準化是為了防止所謂的內部協變量偏移，但事實證明它實際上並沒有做到這一點。 Even so, empirically, it seems to help with object detection networks. 即便如此，從經驗來看，它似乎對物體偵測網路有所幫助。 Ready to get your hands dirty? Let's go. 準備好動手了嗎？我們走吧。 </details> ## 4. Data Augmentation and Deployment 資料增強和部署 <details> <summary>摘要</summary> ## 深度學習課程 - 資料準備與增強技巧 (Data Preparation and Augmentation Techniques) --- ## 理論介紹：資料的重要性 ### 資料質量與數量的影響 **有時，重要的不是模型，而是資料。** There are sayings around this. **垃圾進來，垃圾出去。你不能用母豬的耳朵來製作絲綢錢包。** Garbage in, garbage out. You can't make a silk purse out of the sow's ear. 就像我們人類一樣，神經網路從範例中學習，如果範例不好，模型也不會學習。 **Just like humans, neural networks learn from examples. If the examples aren't great, the model won't learn either.** --- ## 多樣化範例的重要性 ### 教導機器如何識別與區別對象 - **例子的重要性** 考慮如何教孩子什麼是狗。狗的大小、顏色、頭髮長度等都可能不同。 **Consider what it takes to teach a kid what a dog is. Dogs vary in size, color, hair length, etc.** - **區別相似但不同的對象** 例如，土狼、狐狸、豺狼等與狗關係密切，但並非家養狗。 **Coyotes, foxes, and jackals are closely related but aren't domesticated dogs.** --- ## 資料增強的技巧 ### 使用數位工具擴展資料集使用 Photoshop 或 GIMP 等軟體工具，我們可以對照片進行變化來擴展資料集。 **Using tools like Photoshop or GIMP, we can alter photos to expand our datasets.** --- ### 圖像翻轉 (Image Flipping) 可以將照片進行水平或垂直翻轉。 **We can flip images horizontally or vertically.** - **水平翻轉**：對於美國手語來說，由於左右手的差異，水平翻轉適用。 **Horizontal flipping is appropriate for American Sign Language as it accommodates both right-handed and left-handed variations.** - **垂直翻轉**：不適用於手語，因為人們不會倒立來比手勢。 **Vertical flipping is generally ineffective for ASL, as people don't sign upside down.** --- ### 圖像旋轉 (Image Rotation) 旋轉照片可以在一定範圍內使資料多樣化，但應避免旋轉角度過大。 **Rotating images within reason can diversify data, but excessive rotation may cause issues.** --- ### 圖像縮放 (Image Zooming) 當我們縮放照片時，需要考慮視野之外的像素，以確保能正確識別圖像。 **When zooming, we should consider pixels outside our view to maintain proper image recognition.** --- ## 通道移位與亮度調整 ### 調整像素值 **亮度調整**：對於現實世界影像來說，亮度通常不一致，因此調整亮度有助於模型適應不同光照條件。 **Brightness adjustment helps models adapt to varying lighting conditions in real-world images.** **通道移位 (Channel Shifting)**：可以調整顏色通道的像素值，以增加顏色變異性。 **Channel shifting adjusts pixel values in each color channel (e.g., RGB) to increase color variability.** --- ## 圖像準備與模型部署 ### 圖像格式與維度調整當部署模型時，輸入的圖像可能與訓練格式不同。我們可以調整圖像的大小與顏色模式以符合模型需求。 **When deploying a model, input images may differ from training format. We can adjust image size and color mode as needed.** **將單張影像轉換為批次**：可以在影像外新增一對括號，以將其視為批次輸入。 **To convert a single image into a batch, add a new axis (or outer brackets) to the image.** --- ## 結論 **理論已經夠了，現在讓我們在實驗室進行一些練習吧！** OK, that's enough theory for now. Let's get some practice in the lab. --- </details> <details> <summary>中英原文</summary> 跳至主要内容 Theory 理論 Sometimes it's not the model, but the data. 有時，重要的不是模型，而是資料。 There's a lot of sayings around this. 圍繞這件事有很多說法。 Garbage in, garbage out. You can't make a silk purse out of the sow's ear. 垃圾進來，垃圾出去。你不能用母豬的耳朵來製作絲綢錢包。 Just like us humans, neural networks learn from example, and if the examples aren't great, the model won't learn either. 就像我們人類一樣，神經網路從範例中學習，如果範例不好，模型也不會學習。 The other thing to consider is to have enough varied examples. 另一件需要考慮的事情是有足夠多的不同例子。 Let's consider what it takes to teach a kid what a dog is. There are many different kinds of dogs, not just different breeds, but even within a breed, there are many different variations. Size, color, hair length, the size of the snout, the length of the tail… 讓我們考慮一下如何教孩子什麼是狗。狗有很多不同的種類，不僅是不同的品種，而且即使在一個品種內，也有許多不同的變異。大小、顏色、頭髮長度、鼻子的大小、尾巴的長度… If I was an alien that came from a different planet, I might be surprised to learn that Chihuahuas and Great Danes are from the same species. 如果我是來自不同星球的外星人，我可能會驚訝地發現吉娃娃和大丹犬來自同一物種。 At the same time, we should also be providing many different examples of what a dog isn't. 同時，我們也應該提供許多不同的例子來說明狗不是什麼。 Coyotes, foxes and jackals are all closely related, but are technically not domesticated dogs. 土狼、狐狸和豺狼都是近親，但嚴格來說都不是家養的狗。 If we only provide what are two examples of what a dog is and what a dog isn't, our model would not be able to generalize its definition of a dog. 如果我們只提供狗是什麼和狗不是什麼的兩個例子，我們的模型將無法概括其對狗的定義。 Sure seems like a lot of work. 當然，看起來工作量很大。 Wouldn’t it be great if we didn't have to do all of it. 如果我們不必做所有這些事情，那不是很好嗎？ If you've ever spent time altering photos digitally, we're going to be using a lot of the same tools found in Photoshop or GIMP to expand our datasets. 如果您曾經花時間以數位方式修改照片，我們將使用 Photoshop 或 GIMP 中的許多相同工具來擴展我們的資料集。 For instance, even though I've changed the hue in our marble photo, we could still tell that it's a marble. 例如，即使我改變了大理石照片中的色調，我們仍然可以看出它是大理石。 Many machine learning frameworks already have tools in them. 許多機器學習框架已經內建了工具。 To alter photos. 更改照片。 We're going to go over a few of the tricks available to us. These aren't tricks that will always work well in every situation, depending on what we're training our model to do, we might end up confusing it instead. Let's say we're making a model to identify flowers. 我們將回顧一些可用的技巧。這些技巧並不總是在每種情況下都能發揮作用，這取決於我們訓練模型的目的，我們最終可能會混淆它。假設我們正在製作一個模型來識別花朵。 Color would likely be critical information, so changing the color like I did above would end up being unhelpful. 顏色可能是關鍵訊息，因此像我上面所做的那樣更改顏色最終會毫無幫助。 Let's start with image flipping. In this case, we can imagine our photo is printed on a sheet of paper and we're just flipping that sheet of paper over. 讓我們從圖像翻轉開始。在這種情況下，我們可以想像我們的照片印在一張紙上，而我們只是將那張紙翻轉過來。 Fair warning though, flipping an image data set might not always be appropriate. 不過，公平警告，翻轉影像資料集可能並不總是合適的。 Fun fact, with a few exceptions, most words in American Sign Language can be mirrored depending on whether you're right-handed or left-handed, so horizontal flipping is appropriate for our data set. 有趣的是，除了少數例外，美國手語中的大多數單字都可以根據您是右手還是左手進行鏡像，因此水平翻轉適合我們的資料集。 On the other hand, vertical flipping probably is not effective since people aren't usually upside down when signing. 另一方面，垂直翻轉可能並不有效，因為人們在簽名時通常不會顛倒。 We could also think of the MNIST dataset if we both vertically and horizontally flip our sixes, 如果我們垂直和水平翻轉 6，我們還可以想到 MNIST 資料集， they’ll look like nines, unnecessarily confusing our model. 它們看起來就像 9，不必要地混淆了我們的模型。 A related tool is rotation. 一個相關的工具是旋轉。 We could still imagine our piece of paper, but now we're rotating it. 我們仍然可以想像我們的這張紙，但現在我們正在旋轉它。 Is this one that's appropriate for American Sign Language? 這是適合美國手語的嗎？ Within reason. Just checking in with myself, it's pretty difficult to get into the exact same position 100% of the time, but we might also expect a hand sign at 90° would be pretty out of place. 情理之中。就我自己而言，要 100% 處於完全相同的位置是相當困難的，但我們也可能認為 90° 的手勢會非常不合適。 In fact, with many of these tools, it's easy to go too far. 事實上，使用許多這樣的工具很容易走得太遠。 Let's take a look at zooming. 讓我們看一下縮放。 If we think of our sheet of paper analogy, we can imagine we're bringing the sheet closer to our faces. 如果我們用一張紙來比喻，我們可以想像我們正在把這張紙靠近我們的臉。 Here we have our marble, and we're zooming in on it more and more from left to right, top to bottom. For the first two or three. Yeah, we could tell that it's still a marble, but especially for the 4th one, we're starting to lose too much information to be able to properly recognize it. 這裡有我們的彈珠，我們從左到右、從上到下都越來越放大它。對於前兩三個。是的，我們可以看出它仍然是一顆彈珠，但特別是對於第四顆彈珠，我們開始失去太多資訊而無法正確識別它。 We usually only zoom in to our pictures because if we zoom out, we'll need some sort of padding around the edges to keep our picture the same number of pixels. 我們通常只放大圖片，因為如果縮小，我們需要在邊緣周圍進行某種填充，以保持圖片的像素數相同。 When we zoom in, we can keep track of the pixels outside our field of view in order to do some shifting. 當我們放大時，我們可以追蹤視野之外的像素，以便進行一些移動。 Another word for this is translate. 另一個字是翻譯。 We're just going to move our image left to right, horizontal, or up and down, vertical. 我們只需從左到右、水平或上下、垂直移動影像。 In the center here, we've zoomed in on our marble. Each of the surrounding pictures has the same zoom, but has been shifted either up, right, down, or left. 在中心，我們放大了大理石。周圍的每張圖片都具有相同的縮放比例，但已向上、向右、向下或向左移動。 All of these ways where we can warp an image in a physical dimension has an underlying concept called a homography. We could imagine we have a camera and an image, and depending on our camera's position and our images position, we could project our image in three-dimensional space. We can do all of this with some matrix multiplication. Pretty neat. 所有這些在物理維度上扭曲圖像的方法都有一個稱為單應性的基本概念。我們可以想像我們有一台相機和一張影像，根據相機的位置和影像的位置，我們可以將影像投影到三維空間中。我們可以透過一些矩陣乘法來完成所有這些。相當整潔。 There are other things besides physically moving our image. We could adjust the pixel values as well. Here we have some examples of brightness. 除了物理上移動我們的圖像之外，還有其他事情。我們也可以調整像素值。這裡我們有一些亮度的例子。 This is a great one to consider for real world images because unless the photos are professionally shot, it's unlikely that lighting will stay consistent. 對於現實世界的影像來說，這是一個很好的考慮因素，因為除非照片是專業拍攝的，否則照明不太可能保持一致。 Just don't get too crazy with the brightness adjuster. If it's too dark or too bright, we won't be able to see the image. 只是不要對亮度調節器太瘋狂。如果太暗或太亮，我們將看不到影像。 In the upcoming lab, we'll be working with colored images. 在即將到來的實驗室中，我們將使用彩色影像。 Colored images are 3D objects. The third dimension now representing different color channels like red, green, and blue. Each of these different channels has their own independent value. So for instance, if the Red Channel is 255, green is 0 and blue is 0, then the pixel value will be red. 彩色影像是 3D 物件。第三維現在代表不同的顏色通道，如紅色、綠色和藍色。每個不同的管道都有自己獨立的價值。例如，如果紅色通道為 255，綠色為 0，藍色為 0，則像素值將為紅色。 Red 125, green 0, and blue 255 results in a purple-ish color. 紅色 125、綠色 0 和藍色 255 會產生紫色。 There are non-color channels too like alpha and rgba which represents transparency. 還有非顏色通道，例如代表透明度的 alpha 和 rgba。 Channel is shifting takes the pixels in each of these channels and adds or subtracts a random amount. 通道移位取得每個通道中的像素並添加或減去隨機量。 A closing thought before we move on to our next section. 在我們繼續下一部分之前，這是一個結束語。 How many pictures did we get our original marble image? 我們得到了多少張原始大理石圖像？ By my count, it's 39, including the cropped ones. 據我統計，包括剪裁後的數量，共有 39 個。 This is an extremely powerful way to get more data. 這是獲取更多數據的極其有效的方法。 Pro tip, randomly sample the images coming out of our random image generator. If the images can't be interpreted by the human eyes. Chances are the computer will have a tough time too. 專業提示，對來自我們的隨機影像產生器的影像進行隨機取樣。如果影像無法被人眼解讀。計算機很可能也會遇到困難。 OK, one last thought before moving on to the next lab, and that's model deployment. 好的，在進入下一個實驗室之前，最後一個想法就是模型部署。 Our model expects a particular input dimension, but it's very possible that when we deploy a model for use in production or the real world, the format of the data we get from users is different than the format we trained on. 我們的模型需要特定的輸入維度，但當我們部署模型用於生產或現實世界時，我們從使用者獲得的資料的格式很可能與我們訓練的格式不同。 Also, models typically assume training batches, meaning it learns on more than one image at a time. 此外，模型通常假設訓練批次，這意味著它一次學習多個圖像。 Let's say we get an image that we want to make a prediction on that's a different shape than our model trained on. We can resize the image to get the correct width and height. 假設我們得到了一張圖像，我們想要對其進行預測，該圖像的形狀與我們訓練的模型不同。我們可以調整圖像大小以獲得正確的寬度和高度。 If our image is in color and are model trained on great scale, there are tools to do the conversion. 如果我們的圖像是彩色的並且經過大規模的模型訓練，那麼就有一些工具可以進行轉換。 Another way is to take the average between the red, green, and blue channels. 另一種方法是取紅色、綠色和藍色通道的平均值。 Finally, to turn it from a single image into a batch, we can add a new axis along the first dimension. 最後，要將其從單一影像轉換為批次，我們可以沿著第一個維度新增一個新軸。 That's just a fancy way of saying we're going to put a new pair of brackets on the outside of our image. 這只是一種奇特的說法，表示我們將在影像的外部放置一對新的括號。 OK, that's enough theory for now. Let's get some practice in the lab. 好的，現在理論已經夠了。讓我們在實驗室裡進行一些練習。 </details> ## 5. Pre-trained Models 預訓練模型 <details> <summary>摘要</summary> # 深度學習課程 - 神經網路組件與遷移學習 (Neural Network Components and Transfer Learning) --- ## 神經網路的組件回顧 ### 學習率與神經網路層神經網路中的一些重要組件包括： - **學習率 (Learning Rate)**：在梯度下降過程中我們改變權重的幅度。 **Learning rate is the amount we change our weights during gradient descent.** - **層數與神經元數量**：網路中的層和神經元數量。 **The number of layers and neurons in the network.** - **激活函數 (Activation Function)**：選擇激活函數來定義輸出。 **Activation functions we decide to use.** - **數據需求**：為了訓練模型，我們需要大量數據。 **In order to train a model, we need a lot of data.** --- ### 現成的模型好消息是，越來越多的團隊和研究人員提供他們的模型供我們使用。 **The great news is that more teams and researchers are making their models readily available.** - Nvidia 的 GPU Cloud (NGC) 提供現成可下載的模型。 - TensorFlow 和 PyTorch 框架內建模組允許從 URL 加載模型。 - Keras 附帶了許多預先打包的模型，並且所有這些模型都是免費的。 **Keras comes with pre-packaged models, and all of these models are free.** --- ## VGG16 模型 - 圖像識別的深度學習架構 ### VGG16 介紹今天我們將試驗的模型是 VGG16，它是一種適用於大規模圖像識別的深層卷積神經網路架構。 **Today, we’ll be experimenting with the VGG16 model, a deep convolutional neural network for large-scale image recognition.** - **ImageNet Challenge 2014**：VGG16 是該年挑戰賽的獲獎架構，其準確率超過 95%。 - **應用範例**：例如，可以使用 VGG16 製作自動狗狗門，以便識別並讓狗進出。 --- ## 遷移學習 (Transfer Learning) ### 為何選擇遷移學習在完成基本模型後，我們可以使用遷移學習來應對更具體的任務，例如識別特定的狗。 **After completing the basic model, we can use transfer learning for specific tasks, like identifying a particular dog.** --- ### 遷移學習的基本步驟 1. **模型修剪**：我們可以從預訓練模型的頂層開始，然後新增專用層。 **Cut the end off a pre-trained model, keep the top layers, and build new specialized layers.** 2. **凍結或不凍結權重**：考慮是否凍結舊模型的權重。如果數據集較小，訓練大量的權重可能導致過度擬合。 **Consider freezing the weights from the old model. Large models with unfrozen weights can overfit if the dataset is too small.** 3. **偏差 (Bias) 的問題**：在使用遷移學習時，必須考慮資料偏差，例如資料集中某些組別的代表性過高可能會影響模型準確性。 **Data bias can occur if certain groups are overrepresented, affecting the model’s accuracy.** --- ### 夢想生成 (Dreaming) 技術透過遷移學習，我們可以在模型中實現所謂的“夢想生成”。這是一種最大化圖像中模式的技術。 **Through transfer learning, we can implement “dreaming,” a technique to maximize the patterns seen in an image.** - 我們可以透過梯度上升誇大模型看到的模式。 **By using gradient ascent, we exaggerate the patterns the model sees.** --- ## 實作：自動狗狗門與白宮總統狗門 **現在，讓我們製作小狗門和總統專用狗門！** First things first, let’s make our doggie door and our presidential doggy door. --- </details> <details> <summary>中英原文</summary> Let's review some of the components of a neural network. There is a learning rate which is the amount we change our weights during gradient descent. 讓我們回顧一下神經網路的一些組件。有一個學習率，它是我們在梯度下降期間改變權重的量。 There's a number of layers that we have in our network, and the number of neurons… 我們的網路中有很多層，以及神經元的數量... What activation functions we decide to use… 我們決定使用什麼激活函數...... Not to mention, in order to train a model, we need a bunch of data. 更不用說，為了訓練模型，我們需要大量資料。 Sure seems like a lot of work. Would it be great if we didn't have to do all of it? 當然，看起來工作量很大。如果我們不必做所有這些事情，那該有多好？ The great news is that more and more teams and researchers are making their models readily available. 好消息是，越來越多的團隊和研究人員正在提供他們的模型。 Not only are there websites like Nvidia's GPU Cloud (NGC) where you could download ready-to-go models, but TensorFlow and PyTorch have modules within their frameworks to load a model into python given the model's URL. 不僅有像 Nvidia 的 GPU Cloud (NGC) 這樣的網站可以下載現成的模型，而且 TensorFlow 和 PyTorch 的框架內都有模組，可以根據模型的 URL 將模型載入到 python 中。 Keras itself comes with miles pre-packaged. Even better news? All of these miles are free. Keras 本身附帶了預先打包的里程。甚至更好的消息？所有這些里程都是免費的。 The model we'll be experimenting with today is called the VGG16, which was proposed in an appropriately titled paper Very Deep Convolutional Neural Networks for Large-Scale Image Recognition. 我們今天要試驗的模型稱為 VGG16，它是在一篇標題適當的論文「用於大規模圖像識別的非常深卷積神經網路」中提出的。 It was a winning architecture for the 2014 ImageNet Challenge. ImageNet is a database of millions of photos that have been labeled for thousands of categories, including animals, food, trees, sports, and people. Recently, these object detection models performed better than humans having over a 95% accuracy. 它是 2014 年 ImageNet 挑戰賽的獲獎架構。 ImageNet 是一個包含數百萬張照片的資料庫，這些照片已被標記為數千個類別，包括動物、食物、樹木、運動和人物。最近，這些物體偵測模型的表現優於人類，準確率超過 95%。 What better use for this model than to make an automated doggy door? We no longer need to get up in the middle of the night to let our pets out. Since ImageNet has many pictures of animals, we're going to use it to recognize our canine friends and keep all other critters out. 對於這個模型來說，還有什麼比製作自動狗狗門更好的用途呢？我們不再需要半夜起床讓寵物出去。由於 ImageNet 有許多動物圖片，我們將使用它來識別我們的犬類朋友，並將所有其他動物拒之門外。 After completing the doggy door, we're going to take on another challenge using an approach called transfer learning. 完成狗門後，我們將使用一種稱為遷移學習的方法來應對另一個挑戰。 Good news, everyone. The United States Secret Service has learned of our great machine learning skills and has contacted us to make a doggy door for the White House. 大家好消息。美國特勤局了解了我們出色的機器學習技能，並聯繫我們為白宮製作一扇門。 To help us, they've given us a few pictures of Bo, the Portuguese water dog that serves as first dog from 2009 to 2017. 為了幫助我們，他們給了我們幾張葡萄牙水犬 Bo 的照片，它是 2009 年至 2017 年第一隻狗。 We need to make sure the story only recognizes Bo and not any other dogs trying to sneak into the White House. 我們需要確保這個故事只識別出 Bo，而不識別任何其他試圖潛入白宮的狗。 The trouble is, we can't use our pre trained model because it can only recognize dogs in general. VGG16 can't tell the difference between Bo and other kinds of dogs. 問題是，我們無法使用預先訓練的模型，因為它只能辨識出一般的狗。 VGG16無法分辨Bo和其他種類的狗。 But if a model can already tell what a dog is, wouldn't it be great to use that as a starting point? 但如果一個模型已經可以分辨狗是什麼，那麼以此為起點不是很好嗎？ It turns out we can. 事實證明我們可以。 We just have to do a little bit of brain surgery to make it happen. 我們只需要做一些腦部手術即可實現這一目標。 Thankfully, the surgery we're doing on our machine learning models is a lot easier than actual brain surgery. We'll take a pre-trained model and essentially cut the end of it off. We'll use the top layer of the old model and then build out our new layers on the bottom. 值得慶幸的是，我們在機器學習模型上進行的手術比實際的腦部手術要容易得多。我們將採用一個預先訓練的模型，並從本質上剪掉它的末端。我們將使用舊模型的頂層，然後在底部建立新層。 Technically, we can slice and dice these layers any way we want. 從技術上講，我們可以以任何我們想要的方式對這些層進行切片和切塊。 But there's a practical reason for using the top of the model instead at the bottom. 但使用模型的頂部而不是底部是有實際原因的。 As we move from the beginning of our model to the end, from left to right, in this case, our models go from more generalized to more specific. 在這種情況下，當我們從模型的開頭移動到模型的結尾時，從左到右，我們的模型從更概括變為更具體。 The top of the convolutional model picks up edges and each layer builds on the last to build more complicated shapes. 卷積模型的頂部拾取邊緣，每一層都建立在最後一層的基礎上，以建立更複雜的形狀。 The more layers in the model, the more specialized those shapes are going to be based on the data it's trained on. 模型中的層越多，基於其訓練資料的形狀就越專業。 These earlier patterns are easier to generalize. 這些早期的模式更容易概括。 When it comes to transfer learning, it tends to be more useful to copy the building blocks found in the earlier layers. 當涉及到遷移學習時，複製早期層中找到的構建塊往往更有用。 Another thing we should consider is whether or not we want to freeze the weights from the old model 我們應該考慮的另一件事是我們是否要凍結舊模型的權重 that we're building off of. 我們正在建構的基礎上。 Freezing is another way to say we're going to prevent the weights from updating during training, or to make them untrainable. 凍結是我們要防止權重在訓練期間更新或使它們無法訓練的另一種方式。 Making the weights of the old model trainable can help specialize the new model for the new data set coming in. But be warned, if the old model is large, like VGG 16, the higher number of weights will easily result in overfitting if the new data set is too small. 使舊模型的權重可訓練可以幫助針對新資料集專門化新模型。。 This would also be a good time to talk about data bias. Data bias is when a subset of our data is overrepresented. 這也是討論數據偏差的好時機。數據偏差是指我們的數據子集的代表性過高。 For instance, if we're looking to make a model that determines the manufacturer of a car, and we mostly have trucks in our data set, our model will have a hard time identifying vans, sports cars, and sedans. 例如，如果我們想要建立一個模型來確定汽車的製造商，而我們的資料集中主要有卡車，那麼我們的模型將很難識別貨車、跑車和轎車。 This has come up frequently with discussions on the ethical use of artificial intelligence. 這在關於人工智慧的道德使用的討論中經常出現。 If models are only trained on one ethnic group, then they can result in some embarrassing predictions in the underrepresented groups. 如果模型僅針對一個種族群體進行訓練，那麼它們可能會導致對代表性不足的群體做出一些令人尷尬的預測。 This is something to think about when using transfer learning as our new model could pick up the biases of the old model. 使用遷移學習時需要考慮這一點，因為我們的新模型可能會吸收舊模型的偏差。 To end on a fun note, transfer learning is a great opportunity to do something called dreaming. This is done by feeding an image through layers in our neural network, but instead of doing gradient descent, we're going to maximize our loss through gradient descent. The goal is to exaggerate the patterns that the model sees in our image here. 最後以一個有趣的方式結束，遷移學習是一個做夢想的好機會。這是透過在神經網路中的層中輸入影像來完成的，但我們不是進行梯度下降，而是透過梯度下降來最大化損失。目標是誇大模型在我們的圖像中看到的模式。 Has been fed through the 1st 3 convolutional layers from the inception V3 model, and what we're seeing here are the gradients. 從 inception V3 模型開始，已經透過前 3 個卷積層輸入，我們在這裡看到的是梯度。 As it progresses, the layers slowly start picking up more detail. Unfortunately, the full breakdown of the mechanisms would take more time than we have, but we’ve linked the tutorial on how to do it. If you're up for the challenge. 隨著進展，各層慢慢開始獲取更多細節。不幸的是，完全分解這些機制需要比我們更多的時間，但我們已經連結瞭如何做到這一點的教程。如果你準備好迎接挑戰。 First things first, let's make our doggie door and our presidential doggy door. 首先，讓我們製作我們的小狗門和總統小狗門。 </details> ## 6: Natural Language Processing 自然語言處理 <details> <summary>摘要</summary> 待整理... </details> <details> <summary>中英原文</summary> 待整理... </details> ## 7: Wrap-up and Assessment 總結和評估 <details> <summary>摘要</summary> 待整理... </details> <details> <summary>中英原文</summary> 待整理... </details>