--- title:科技英文-EMR Training: Intro to EMR (1 of 11) --- # EMR Training: Intro to EMR (1 of 11) 1 00:00:00,170 --> 00:00:02,319 welcome to Hadoop and Amazon Web 歡迎使用Hadoop和Amazon Web hadupe 2 00:00:02,520 --> 00:00:04,719 Services this first section is a short 服務的第一部分是簡短的 3 00:00:04,919 --> 00:00:06,309 overview and it begins with the 概述,它始於 4 00:00:06,509 --> 00:00:09,010 obligatory who's this disembodied voice 誰是這個無形的聲音 Obligatory:需要的 5 00:00:09,210 --> 00:00:10,300 you're hearing during the course well 你在課程中聽的很好 6 00:00:10,500 --> 00:00:11,979 I'm Ken Krug ler I've been using Hadoop 我是Ken Krug ler,我一直在使用Hadoop 7 00:00:12,179 --> 00:00:15,010 for many years since around 2006 I'm 自2006年左右以來多年以來 2006=2000+6 8 00:00:15,210 --> 00:00:18,039 active in the Apache tika project and I 積極參與Apache tika專案,我 Apache:美國原住民 9 00:00:18,239 --> 00:00:20,530 used to Hadoop almost every day to create 過去幾乎每天都會make something 10 00:00:20,730 --> 00:00:23,199 solutions for our customers these 為我們的客戶提供的解決方案 11 00:00:23,399 --> 00:00:25,140 solutions typically involve things like 解決方案通常涉及諸如 Typically:一般來說 12 00:00:25,339 --> 00:00:27,609 large-scale web crawling machine 大型網頁爬行機 老師說他也沒見過這個詞 13 00:00:27,809 --> 00:00:30,789 learning often the outcome of this 經常學習這個的結果 14 00:00:30,989 --> 00:00:33,698 extract information and load workflow is a 提取轉換和載入工作流程是 Workflow:路線 15 00:00:33,899 --> 00:00:35,979 solar index and for the past two and a 太陽指數,以及過去兩個和一個:老師說這只是在說故事.不重要 16 00:00:36,179 --> 00:00:38,079 half years I've almost exclusively been 半年我幾乎完全是 Exclusively=only 17 00:00:38,280 --> 00:00:40,809 using Amazon Web Services both easy to 使用Amazon Web Services既容易 18 00:00:41,009 --> 00:00:43,419 and elastic MapReduce to create these 和彈性MapReduce來創建這些 19 00:00:43,619 --> 00:00:48,099 Two Hadoop solutions this course assumes 本課程假定的兩個重複解決方案 Assumes:原因/認可, 假定,假設 20 00:00:48,299 --> 00:00:50,169 that you already know Hadoop this is not 您已經知道Hadoop,這不是 21 00:00:50,369 --> 00:00:52,358 an intro to Hadoop course what it Hadoop課程簡介 Intro:introduction 簡報 22 00:00:52,558 --> 00:00:54,099 assumes is that you want to learn how to 假設您想學習如何 23 00:00:54,299 --> 00:00:56,349 use Amazon's elastic MapReduce to solve 使用亞馬遜的彈性MapReduce解決 24 00:00:56,549 --> 00:00:58,599 the new problems so for the next ten 接下來的十個新問題 25 00:00:58,799 --> 00:01:01,149 modules we're going to take you from you 模組將帶您離開 Modules=part 26 00:01:01,350 --> 00:01:03,099 know nothing about it to you can 對它一無所知 27 00:01:03,299 --> 00:01:05,200 actually use it to solve your Hadoop 實際使用它來解決您的Hadoop 28 00:01:05,400 --> 00:01:08,679 problems we're gonna cover how to get 我們要解決的問題 Gonna=口語=going to Cover=talk about 29 00:01:08,879 --> 00:01:10,299 started which means getting an account 開始,這意味著要獲得一個帳戶 30 00:01:10,500 --> 00:01:11,980 getting everything configured so you can 進行所有配置,以便您可以 Configured: 設置 31 00:01:12,180 --> 00:01:14,109 actually run jobs how you decide what 實際執行工作,您如何決定 32 00:01:14,310 --> 00:01:16,119 kinds of servers and how you how big 各種伺服器以及您有多大 33 00:01:16,319 --> 00:01:17,890 your cluster needs to be we'll have a 您的集群需要是,我們將有一個 Cluster:機房(很多電腦在一起) 34 00:01:18,090 --> 00:01:20,109 hands-on lab which involves processing 涉及處理的動手實驗室 35 00:01:20,310 --> 00:01:21,879 Wikipedia data then we're going to get 維琪百科的資料,那麼我們將得到 Wikipedia:只有台灣才只說Wiki 36 00:01:22,079 --> 00:01:24,340 into more advanced topics how do you use 進入更高級的主題,您如何使用 37 00:01:24,540 --> 00:01:26,379 the command-line tools instead of the 命令列工具而不是 38 00:01:26,579 --> 00:01:27,250 browser interface:網站像CHROME 流覽器介面 39 00:01:27,450 --> 00:01:29,528 how do you do bug your workflows inside 您如何做才能將您的工作流程弄錯 Bug:問題 40 00:01:29,728 --> 00:01:31,390 of elastic MapReduce and then we're MapReduce的彈性,然後我們 41 00:01:31,590 --> 00:01:33,308 gonna cover some hive and pig including 會蓋一些蜂巢和豬,包括 N00b=person who is new to do something->美國人真的會這樣說 42 00:01:33,509 --> 00:01:35,109 a hive lab and finally we'll get into 一個配置單元實驗室,最後我們將進入 43 00:01:35,310 --> 00:01:37,119 advanced topics things like how do you 進階主題,例如您如何 44 00:01:37,319 --> 00:01:39,128 use spot pricing to reduce your cost how 使用現貨定價來降低成本 45 00:01:39,328 --> 00:01:41,409 can you dynamically change the size of 您可以動態更改 Dynamically=very much 46 00:01:41,609 --> 00:01:43,869 your cluster now as a fundamental 您的集群現在已成為基礎 Fundamental=a key 47 00:01:44,069 --> 00:01:46,058 question of why do you want to use 為什麼要使用的問題 48 00:01:46,259 --> 00:01:47,829 elastic MapReduce why are you even 彈性MapReduce為什麼你甚至 49 00:01:48,030 --> 00:01:49,058 listening to this course well there's 很好地聽這門課 50 00:01:49,259 --> 00:01:51,128 three key reasons the first has to do 首先要做的三個關鍵原因 51 00:01:51,328 --> 00:01:53,619 with cost now if you have your own 現在,如果您有自己的成本 52 00:01:53,819 --> 00:01:55,448 cluster like we did at my crew will 就像我們在機組人員身上所做的那樣 53 00:01:55,649 --> 00:01:56,859 start up here's what happens you set up 開始這裡就是您設置的過程 54 00:01:57,060 --> 00:01:58,539 the cluster which means you buy the 集群,這意味著您購買了 55 00:01:58,739 --> 00:02:01,299 hardware you rack it you Network it you 硬體,機架,網路 56 00:02:01,500 --> 00:02:03,219 configure it you know you have an ops 配置它,您知道您有操作 57 00:02:03,420 --> 00:02:04,959 person who's busy doing all those things 忙著做所有這些事情的人 58 00:02:05,159 --> 00:02:08,169 and then you typically don't use it all 然後您通常不會全部使用 59 00:02:08,368 --> 00:02:10,479 the time during development like for 開發過程中的時間 60 00:02:10,679 --> 00:02:11,890 example a crew will we use their cluster 例如,機組人員將使用他們的集群 Crew=team of workers 61 00:02:12,090 --> 00:02:13,390 maybe 20 percent of the time 大概有20%的時間 62 00:02:13,590 --> 00:02:14,890 which means the effective cost was 這意味著有效成本是 63 00:02:15,090 --> 00:02:17,770 almost 5x now with elastic MapReduce 現在使用彈性MapReduce幾乎提高了5倍 5x=5 times 64 00:02:17,969 --> 00:02:20,530 you're only paying for what you actually 您只為您實際支付的錢 65 00:02:20,729 --> 00:02:23,259 need when you need it and you're not 需要,當您需要它而您不 66 00:02:23,459 --> 00:02:26,080 paying an ops team to maintain a cluster 支付操作團隊維護集群 Ops=operation =to check anything is OK 67 00:02:26,280 --> 00:02:28,030 now the second key point is you have a 現在第二個關鍵點是 68 00:02:28,229 --> 00:02:30,009 lot more agility for example if you 例如更多的敏捷性 69 00:02:30,209 --> 00:02:31,960 suddenly decide I need a cluster of 100 突然決定我需要100個集群 70 00:02:32,159 --> 00:02:33,700 servers and they need to be this kind of 伺服器,他們需要這種 71 00:02:33,900 --> 00:02:35,200 server you don't have to buy those 伺服器,您不必購買那些 72 00:02:35,400 --> 00:02:36,520 servers or rack them or do all that 伺服器或機架它們或做所有這些 73 00:02:36,719 --> 00:02:38,530 other stuff you can just from the 你可以從其他東西 Stuff=事情 74 00:02:38,729 --> 00:02:40,840 command line or from the AWS browser 命令列或從AWS流覽器 Command-line 75 00:02:41,039 --> 00:02:42,789 interface say I want a cluster that's 介面說我想要一個集群 76 00:02:42,989 --> 00:02:44,649 this big and in a couple minutes you 這麼大,幾分鐘後 77 00:02:44,848 --> 00:02:48,930 have it and finally you get to focus on 有了它,最後你開始專注於 78 00:02:49,129 --> 00:02:51,400 the actual Hadoop workflow you're not 您不是真正的Hadoop工作流程 79 00:02:51,599 --> 00:02:53,110 spending your time worrying about which 花時間擔心哪個 80 00:02:53,310 --> 00:02:54,430 version of a and Hadoop how to upgrade 版本的欺騙以及如何升級 Version=版本(N) Upgrade=(N)(V) 81 00:02:54,629 --> 00:02:57,009 Hadoop and what happens when one of your Hadoop,當您的一個 82 00:02:57,209 --> 00:02:58,300 clusters is having a problem with the 集群有問題 83 00:02:58,500 --> 00:03:01,000 Hadoop Amazon has a very specific 杜佩和亞馬遜有一個非常具體的 84 00:03:01,199 --> 00:03:02,830 version of Hadoop that's been optimized 經過優化的Hadoop版本 Optimized=the make the best way 優化 85 00:03:03,030 --> 00:03:05,618 for Amazon's infrastructure so you get 亞馬遜的基礎設施 86 00:03:05,818 --> 00:03:08,110 maximum performance and it reduces the 最大的性能,它減少了 87 00:03:08,310 --> 00:03:13,310 number of issues that you'll run into 您將遇到的問題數量