# 實驗一:美國棒球大聯盟MLB資料分析 ### 主講人:黃夙賢 --- ## 美國棒球大聯盟MLB資料分析 - 美國著名體育記者Sean Lahman,針對美國棒球大聯盟MLB,提供Lahman’s Baseball Database棒球資料庫,詳細記載美國大聯盟MLB 1871年成立以來至今的各項棒球數據,為棒球愛好者所樂於使用之資料集。本資料集適用於教導學生,如何於巨量資料中,利用Hadoop Hive查詢介面,找出各項統計數據。 --- ## 資料來源 - [https://github.com/chadwickbureau/baseballdatabank /raw/master/core/Batting.csv](https://github.com/chadwickbureau/baseballdatabank/raw/master/core/Batting.csv) --- ## 請回答下列問題 - Q1: 共有幾筆資料 - Q2: 2021年的全壘打王是誰 - Q3: 列出歷年來的全壘王 --- ## MLB大聯盟資料檔 - wget [https://github.com/chadwickbureau/baseballdatabank /raw/master/core/Batting.csv](https://github.com/chadwickbureau/baseballdatabank/raw/master/core/Batting.csv) - hadoop fs -put Batting.csv /user/mapr - hadoop fs –ls --- ![](https://i.imgur.com/F234N34.png) ![](https://i.imgur.com/dCzG3Cu.png) --- ## 新增資料表 ```sql= create table batting (playerID STRING, yearID STRING, stint INT, teamID STRING, lgID STRING, G INT, AB INT, R INT, H INT, twoB INT, threeB INT, HR INT, RBI INT, SB INT, CS INT, BB INT, SO INT, IBB INT, HBP INT, SH INT, SF INT, GIDP INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' tblproperties("skip.header.line.count"="1"); ``` --- ![](https://i.imgur.com/X05t7jR.png) --- ## 讀入資料 - LOAD DATA INPATH '/user/mapr/Batting.csv' OVERWRITE INTO TABLE batting; ![](https://i.imgur.com/25LTyrX.png) --- ## Q1:共有幾筆資料 - select count(*) from batting; - 答案: 110495筆資料 (2021年資料) ![](https://i.imgur.com/KCzoGvO.png) --- ## Q2:2021年的全壘打王是誰 - [select T.playerid, T.yearid, T.hr from batting T where T.yearid=2021 and T.hr in (select max(hr) from batting where yearid=2021);]() - 加速版本: - select max(hr) from batting where yearid=2021 - select playerid, yearid, hr from batting where yearid=2021 and hr=48 - Ans: Perez Diaz, 全壘打數量是48支 --- ![](https://i.imgur.com/hvQr7xS.png) ![](https://i.imgur.com/fiuWiGB.png) --- ## Q3:列出歷年來的全壘打王 - [select T.playerid, T.yearid, T.hr from batting T join (select yearid, max(hr) as hr from batting group by yearid) a on T.yearid=a.yearid and T.hr=a.hr order by t.yearid desc;]() --- ![](https://i.imgur.com/A6WQn1K.png)
{"metaMigratedAt":"2023-06-17T10:19:43.530Z","metaMigratedFrom":"YAML","title":"實驗一:美國棒球大聯盟MLB資料分析","breaks":true,"contributors":"[{\"id\":\"ef0225b9-6c2a-4012-82c9-fa1031d2c4db\",\"add\":2350,\"del\":160}]","description":"美國著名體育記者Sean Lahman,針對美國棒球大聯盟MLB,提供Lahman’s Baseball Database棒球資料庫,詳細記載美國大聯盟MLB 1871年成立以來至今的各項棒球數據,為棒球愛好者所樂於使用之資料集。本資料集適用於教導學生,如何於巨量資料中,利用Hadoop Hive查詢介面,找出各項統計數據。"}
    586 views