ETL Assignment
用ETL開啟我的第一篇技術文章,也是監視自我的成長的第一篇文。 事先說明一下這個ETL題目是來自於我在Cousera線上課程平台裡的IBM Data Engineering Professional Certificate 中的 Python Project for Data Engineering 的期末作業。 題目主要要求編寫一個簡單的ETL python程式,這次題目的ETL程式可以拆解成,Extract,Transform,Load還有Logging,四大部分進行編寫。
題目要求:
Objectives In this final part you will:
- Run the ETL process.
- Extract bank and market cap data from the JSON file bank_market_cap.json.
- Transform the market cap currency using the exchange rate data.
- Load the transformed data into a seperate CSV.
事前準備:
- 安裝或更新專案裡需要的套件,如:glob, pandas, requests, 和 datetime.
- 導入所需要的模塊。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
題目要求:
Extract Function
Define the extract function that finds JSON file bank_market_cap_1.json and calls the function created above to extract data from them. Store the data in a pandas dataframe. Use the following list for the columns
編寫Json,CSV的Extract function.
- 把bank_market_cap_1.json資料導入到dataframe。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
題目要求:
Load the file exchange_rates.csv as a dataframe and find the exchange rate for British pounds with the symbol GBP, store it in the variable exchange_rate, you will be asked for the number. Hint: set the parameter index_col to 0.
- 使用DataFrame把匯率欄的‘Unnamed: 0’更名為currency及設為index;使用loc提取GBP的匯率。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
題目要求:
Using exchange_rate and the exchange_rates.csv file find the exchange rate of USD to GBP. Write a transform function that
- Changes the Market Cap (USD Billion) column from USD to GBP.
- Rounds the Market Cap (USD Billion)` column to 3 decimal places.
- Rename Market Cap (USB Billion) to Market Cap (GBP Billion).
- 把以美金為單位的市值(Market Cap)轉換成使用英鎊為單位的市值。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
3. Load
題目要求:
Create a function that takes a dataframe and load it to a csv named bank_market_cap_gbp.csv
. Make sure to set index
to False
.
4. Logging Function
- logging函數是一個用來記錄ETL程式執行的每個步驟是否有順利完成。
5. Running the ETL Process
- 執行logging記錄Extract程式是否有順利進行。

- 執行logging記錄Transform程式是否有順利進行。

- 執行logging記錄Load程式是否有順利進行。
總結:
以上是我對解答這個題目的想法,畢竟還是一隻大菜鳥,所以還有程式優化和文章表達不足的地方;歡迎大大們的指教,謝謝。