---
title:科技英文-EMR Training: Intro to EMR (1 of 11)
---
# EMR Training: Intro to EMR (1 of 11)
1
00:00:00,170 --> 00:00:02,319
welcome to Hadoop and Amazon Web
歡迎使用Hadoop和Amazon Web
hadupe
2
00:00:02,520 --> 00:00:04,719
Services this first section is a short
服務的第一部分是簡短的
3
00:00:04,919 --> 00:00:06,309
overview and it begins with the
概述,它始於
4
00:00:06,509 --> 00:00:09,010
obligatory who's this disembodied voice
誰是這個無形的聲音
Obligatory:需要的
5
00:00:09,210 --> 00:00:10,300
you're hearing during the course well
你在課程中聽的很好
6
00:00:10,500 --> 00:00:11,979
I'm Ken Krug ler I've been using Hadoop
我是Ken Krug ler,我一直在使用Hadoop
7
00:00:12,179 --> 00:00:15,010
for many years since around 2006 I'm
自2006年左右以來多年以來
2006=2000+6
8
00:00:15,210 --> 00:00:18,039
active in the Apache tika project and I
積極參與Apache tika專案,我
Apache:美國原住民
9
00:00:18,239 --> 00:00:20,530
used to Hadoop almost every day to create
過去幾乎每天都會make something
10
00:00:20,730 --> 00:00:23,199
solutions for our customers these
為我們的客戶提供的解決方案
11
00:00:23,399 --> 00:00:25,140
solutions typically involve things like
解決方案通常涉及諸如
Typically:一般來說
12
00:00:25,339 --> 00:00:27,609
large-scale web crawling machine
大型網頁爬行機
老師說他也沒見過這個詞
13
00:00:27,809 --> 00:00:30,789
learning often the outcome of this
經常學習這個的結果
14
00:00:30,989 --> 00:00:33,698
extract information and load workflow is a
提取轉換和載入工作流程是
Workflow:路線
15
00:00:33,899 --> 00:00:35,979
solar index and for the past two and a
太陽指數,以及過去兩個和一個:老師說這只是在說故事.不重要
16
00:00:36,179 --> 00:00:38,079
half years I've almost exclusively been
半年我幾乎完全是
Exclusively=only
17
00:00:38,280 --> 00:00:40,809
using Amazon Web Services both easy to
使用Amazon Web Services既容易
18
00:00:41,009 --> 00:00:43,419
and elastic MapReduce to create these
和彈性MapReduce來創建這些
19
00:00:43,619 --> 00:00:48,099
Two Hadoop solutions this course assumes
本課程假定的兩個重複解決方案
Assumes:原因/認可, 假定,假設
20
00:00:48,299 --> 00:00:50,169
that you already know Hadoop this is not
您已經知道Hadoop,這不是
21
00:00:50,369 --> 00:00:52,358
an intro to Hadoop course what it
Hadoop課程簡介
Intro:introduction 簡報
22
00:00:52,558 --> 00:00:54,099
assumes is that you want to learn how to
假設您想學習如何
23
00:00:54,299 --> 00:00:56,349
use Amazon's elastic MapReduce to solve
使用亞馬遜的彈性MapReduce解決
24
00:00:56,549 --> 00:00:58,599
the new problems so for the next ten
接下來的十個新問題
25
00:00:58,799 --> 00:01:01,149
modules we're going to take you from you
模組將帶您離開
Modules=part
26
00:01:01,350 --> 00:01:03,099
know nothing about it to you can
對它一無所知
27
00:01:03,299 --> 00:01:05,200
actually use it to solve your Hadoop
實際使用它來解決您的Hadoop
28
00:01:05,400 --> 00:01:08,679
problems we're gonna cover how to get
我們要解決的問題
Gonna=口語=going to
Cover=talk about
29
00:01:08,879 --> 00:01:10,299
started which means getting an account
開始,這意味著要獲得一個帳戶
30
00:01:10,500 --> 00:01:11,980
getting everything configured so you can
進行所有配置,以便您可以
Configured: 設置
31
00:01:12,180 --> 00:01:14,109
actually run jobs how you decide what
實際執行工作,您如何決定
32
00:01:14,310 --> 00:01:16,119
kinds of servers and how you how big
各種伺服器以及您有多大
33
00:01:16,319 --> 00:01:17,890
your cluster needs to be we'll have a
您的集群需要是,我們將有一個
Cluster:機房(很多電腦在一起)
34
00:01:18,090 --> 00:01:20,109
hands-on lab which involves processing
涉及處理的動手實驗室
35
00:01:20,310 --> 00:01:21,879
Wikipedia data then we're going to get
維琪百科的資料,那麼我們將得到
Wikipedia:只有台灣才只說Wiki
36
00:01:22,079 --> 00:01:24,340
into more advanced topics how do you use
進入更高級的主題,您如何使用
37
00:01:24,540 --> 00:01:26,379
the command-line tools instead of the
命令列工具而不是
38
00:01:26,579 --> 00:01:27,250
browser interface:網站像CHROME
流覽器介面
39
00:01:27,450 --> 00:01:29,528
how do you do bug your workflows inside
您如何做才能將您的工作流程弄錯
Bug:問題
40
00:01:29,728 --> 00:01:31,390
of elastic MapReduce and then we're
MapReduce的彈性,然後我們
41
00:01:31,590 --> 00:01:33,308
gonna cover some hive and pig including
會蓋一些蜂巢和豬,包括
N00b=person who is new to do something->美國人真的會這樣說
42
00:01:33,509 --> 00:01:35,109
a hive lab and finally we'll get into
一個配置單元實驗室,最後我們將進入
43
00:01:35,310 --> 00:01:37,119
advanced topics things like how do you
進階主題,例如您如何
44
00:01:37,319 --> 00:01:39,128
use spot pricing to reduce your cost how
使用現貨定價來降低成本
45
00:01:39,328 --> 00:01:41,409
can you dynamically change the size of
您可以動態更改
Dynamically=very much
46
00:01:41,609 --> 00:01:43,869
your cluster now as a fundamental
您的集群現在已成為基礎
Fundamental=a key
47
00:01:44,069 --> 00:01:46,058
question of why do you want to use
為什麼要使用的問題
48
00:01:46,259 --> 00:01:47,829
elastic MapReduce why are you even
彈性MapReduce為什麼你甚至
49
00:01:48,030 --> 00:01:49,058
listening to this course well there's
很好地聽這門課
50
00:01:49,259 --> 00:01:51,128
three key reasons the first has to do
首先要做的三個關鍵原因
51
00:01:51,328 --> 00:01:53,619
with cost now if you have your own
現在,如果您有自己的成本
52
00:01:53,819 --> 00:01:55,448
cluster like we did at my crew will
就像我們在機組人員身上所做的那樣
53
00:01:55,649 --> 00:01:56,859
start up here's what happens you set up
開始這裡就是您設置的過程
54
00:01:57,060 --> 00:01:58,539
the cluster which means you buy the
集群,這意味著您購買了
55
00:01:58,739 --> 00:02:01,299
hardware you rack it you Network it you
硬體,機架,網路
56
00:02:01,500 --> 00:02:03,219
configure it you know you have an ops
配置它,您知道您有操作
57
00:02:03,420 --> 00:02:04,959
person who's busy doing all those things
忙著做所有這些事情的人
58
00:02:05,159 --> 00:02:08,169
and then you typically don't use it all
然後您通常不會全部使用
59
00:02:08,368 --> 00:02:10,479
the time during development like for
開發過程中的時間
60
00:02:10,679 --> 00:02:11,890
example a crew will we use their cluster
例如,機組人員將使用他們的集群
Crew=team of workers
61
00:02:12,090 --> 00:02:13,390
maybe 20 percent of the time
大概有20%的時間
62
00:02:13,590 --> 00:02:14,890
which means the effective cost was
這意味著有效成本是
63
00:02:15,090 --> 00:02:17,770
almost 5x now with elastic MapReduce
現在使用彈性MapReduce幾乎提高了5倍
5x=5 times
64
00:02:17,969 --> 00:02:20,530
you're only paying for what you actually
您只為您實際支付的錢
65
00:02:20,729 --> 00:02:23,259
need when you need it and you're not
需要,當您需要它而您不
66
00:02:23,459 --> 00:02:26,080
paying an ops team to maintain a cluster
支付操作團隊維護集群
Ops=operation =to check anything is OK
67
00:02:26,280 --> 00:02:28,030
now the second key point is you have a
現在第二個關鍵點是
68
00:02:28,229 --> 00:02:30,009
lot more agility for example if you
例如更多的敏捷性
69
00:02:30,209 --> 00:02:31,960
suddenly decide I need a cluster of 100
突然決定我需要100個集群
70
00:02:32,159 --> 00:02:33,700
servers and they need to be this kind of
伺服器,他們需要這種
71
00:02:33,900 --> 00:02:35,200
server you don't have to buy those
伺服器,您不必購買那些
72
00:02:35,400 --> 00:02:36,520
servers or rack them or do all that
伺服器或機架它們或做所有這些
73
00:02:36,719 --> 00:02:38,530
other stuff you can just from the
你可以從其他東西
Stuff=事情
74
00:02:38,729 --> 00:02:40,840
command line or from the AWS browser
命令列或從AWS流覽器
Command-line
75
00:02:41,039 --> 00:02:42,789
interface say I want a cluster that's
介面說我想要一個集群
76
00:02:42,989 --> 00:02:44,649
this big and in a couple minutes you
這麼大,幾分鐘後
77
00:02:44,848 --> 00:02:48,930
have it and finally you get to focus on
有了它,最後你開始專注於
78
00:02:49,129 --> 00:02:51,400
the actual Hadoop workflow you're not
您不是真正的Hadoop工作流程
79
00:02:51,599 --> 00:02:53,110
spending your time worrying about which
花時間擔心哪個
80
00:02:53,310 --> 00:02:54,430
version of a and Hadoop how to upgrade
版本的欺騙以及如何升級
Version=版本(N)
Upgrade=(N)(V)
81
00:02:54,629 --> 00:02:57,009
Hadoop and what happens when one of your
Hadoop,當您的一個
82
00:02:57,209 --> 00:02:58,300
clusters is having a problem with the
集群有問題
83
00:02:58,500 --> 00:03:01,000
Hadoop Amazon has a very specific
杜佩和亞馬遜有一個非常具體的
84
00:03:01,199 --> 00:03:02,830
version of Hadoop that's been optimized
經過優化的Hadoop版本
Optimized=the make the best way 優化
85
00:03:03,030 --> 00:03:05,618
for Amazon's infrastructure so you get
亞馬遜的基礎設施
86
00:03:05,818 --> 00:03:08,110
maximum performance and it reduces the
最大的性能,它減少了
87
00:03:08,310 --> 00:03:13,310
number of issues that you'll run into
您將遇到的問題數量