陳子傑
  • NEW!
    NEW!  Connect Ideas Across Notes
    Save time and share insights. With Paragraph Citation, you can quote others’ work with source info built in. If someone cites your note, you’ll see a card showing where it’s used—bringing notes closer together.
    Got it
      • Create new note
      • Create a note from template
        • Sharing URL Link copied
        • /edit
        • View mode
          • Edit mode
          • View mode
          • Book mode
          • Slide mode
          Edit mode View mode Book mode Slide mode
        • Customize slides
        • Note Permission
        • Read
          • Only me
          • Signed-in users
          • Everyone
          Only me Signed-in users Everyone
        • Write
          • Only me
          • Signed-in users
          • Everyone
          Only me Signed-in users Everyone
        • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invite by email
        Invitee

        This note has no invitees

      • Publish Note

        Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

        Your note will be visible on your profile and discoverable by anyone.
        Your note is now live.
        This note is visible on your profile and discoverable online.
        Everyone on the web can find and read all notes of this public team.

        Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

        Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

        Explore these features while you wait
        Complete general settings
        Bookmark and like published notes
        Write a few more notes
        Complete general settings
        Write a few more notes
        See published notes
        Unpublish note
        Please check the box to agree to the Community Guidelines.
        View profile
      • Commenting
        Permission
        Disabled Forbidden Owners Signed-in users Everyone
      • Enable
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
        • Everyone
      • Suggest edit
        Permission
        Disabled Forbidden Owners Signed-in users Everyone
      • Enable
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
      • Emoji Reply
      • Enable
      • Versions and GitHub Sync
      • Note settings
      • Note Insights New
      • Engagement control
      • Make a copy
      • Transfer ownership
      • Delete this note
      • Save as template
      • Insert from template
      • Import from
        • Dropbox
        • Google Drive
        • Gist
        • Clipboard
      • Export to
        • Dropbox
        • Google Drive
        • Gist
      • Download
        • Markdown
        • HTML
        • Raw HTML
    Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
    Create Create new note Create a note from template
    Menu
    Options
    Engagement control Make a copy Transfer ownership Delete this note
    Import from
    Dropbox Google Drive Gist Clipboard
    Export to
    Dropbox Google Drive Gist
    Download
    Markdown HTML Raw HTML
    Back
    Sharing URL Link copied
    /edit
    View mode
    • Edit mode
    • View mode
    • Book mode
    • Slide mode
    Edit mode View mode Book mode Slide mode
    Customize slides
    Note Permission
    Read
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    Write
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 資訊是甚麼,可以吃嗎?Python抓取線上資訊並進行視覺化分析 [colab範例程式碼 - 從政府公開資訊.csv](https://colab.research.google.com/drive/1fgEBaSUqRtT_kRR0oVu4xLcD1QKpGyjb#scrollTo=lU-wfkEUqp4z) [colab範例程式碼 - kaggle與Titanic](https://colab.research.google.com/drive/15HQyAm3qkKP6ibbOEWFTcPGBXd9NsFwg#scrollTo=PyLUJNKy4F6P) ## 資料流程與分工 在進行資料分析時,我們可以將過程分為以下幾個步驟: 1. **資料取得**:從政府或開放數據平台下載資料,例如使用 API 獲取 `.csv` 檔案。 2. **資料預處理**:清理和整理資料,包含刪除不需要的欄位、處理重複資料、轉換資料格式等。 3. **資料分析**:利用數據運算和分組技術,計算指標或生成摘要。 4. **資料視覺化**:使用圖表(如長條圖、折線圖)來呈現分析結果。 5. **洞察與應用**:根據分析結果提出結論,應用於實際決策或研究中。 ## 簡易資料分析 - 從政府公開資訊開始 >[!Note]政府資料開放平台 >為各機關於職權範圍內取得或做成,且依法得公開之各類電子**資料**,包含文字、數據、圖片、影像、聲音、詮釋**資料**(metadata)等,以**開放**格式於網路公開,提供個人、學校、團體、企業或**政府**機關等使用者,依其需求連結下載及利用。 ## 資料是什麼,電腦看得懂嗎 在各種網站當中,如果他想要開放資料,他們都會將資料轉為.csv檔案(逗號區隔檔案) >[!Note]CSV (Comma-Separated Values) >是一種常見且通用性強的資料儲存格式,記錄中的各個資料欄位由逗號分隔。 而我們將利用python的兩個函式庫,協助我們**資料抓取與整理**以及**資料視覺化** ## pandas 與 matplotlib 函式庫 |名稱| 說明| |-|-| |pandas| 資料處理和分析模組(資料處理)。| |matplotlib |將資料視覺化(圖表製作)。| --- # 範例:空氣品質預報資料分析 ## 1. 資料取得 使用 `pandas` 的 `read_csv` 方法讀取資料,資料來源為 [空氣品質預報資料](https://data.gov.tw/dataset/6349)。 ```python import pandas as pd import matplotlib.pyplot as plt path = 'https://data.moenv.gov.tw/api/v2/aqf_p_01?api_key=e8dd42e6-9b8b-43f8-991e-b3dee723a52d&limit=1000&sort=publishtime+desc&format=CSV' df = pd.read_csv(path) ``` >[!Tip]連結在哪裡 >![image](https://hackmd.io/_uploads/ry8evVgj0.png) > 對.csv檔案下載的地方"右鍵->複製連結網址" ### **利用 `read_csv(path)` 將資料讀取到 DataFrame** df = pd.read_csv(path) - 使用 `pandas` 的 `read_csv` 函數,將CSV檔案中的資料讀取並存儲到名為 `df` 的 DataFrame 中。 - 你可以想像將Excel表格的資料塞到變數df當中 ## 2.資料預整理 ### 查看資料每個 column 的名稱 ```python print(df.columns) ``` - 這樣就能確認每個row的標籤意涵了 > ![image](https://hackmd.io/_uploads/H1LAuEeiA.png) - 根據結果可知,標籤有content,publishtime...等標籤,可知第一行一定都是"說明",第二行都是"發布時間".... ### 將不要的資訊丟掉 因為我們要分析"一個地區各時間平均"以及"一個時間內各地平均" => 其實,我們真正要的資訊只有'area'、'forecastdate'、'aqi' ```python df.drop(['publishtime','content', 'majorpollutant','minorpollutant','minorpollutantaqi'], axis=1, inplace=True) ``` >[!Note] >axis=1指定要刪除的是列而不是行 >inplace=True 表示直接在原 DataFrame 上進行修改 ![S__3027778](https://hackmd.io/_uploads/ryBmoNxj0.jpg) - 將不要的標籤行都刪除,並且 `inplace=True` 表示我們直接在原DataFrame上進行修改,而不返回新的DataFrame。 ### 如果你夠細心,你可能會發現這原資料的毛病 如果你有看過他的原資料,你會發現,有一些資料他是**重複登記的** >[!Caution] >猜測因為這資料是"預測並且要公布",才出現這樣的狀況 >> 8/15時,公告8/15 - 8/17的資訊 >> 8/16時,公告8/16 - 8/18的資訊 >> 可見8/17的資訊被重複公告(而且資訊還一模一樣) 所以我們可以利用.drop_duplicates()將重複資訊刪除 ```python df.drop_duplicates() ``` ![S__3027779](https://hackmd.io/_uploads/HkA4pVgjA.jpg) ### 觀察看看"有重複資訊"的資料長度跟"沒有重複資訊"的資料長度差多少 ```python print(len(df)) print(len(df_unique)) ``` ![image](https://hackmd.io/_uploads/ry1KaNloR.png) >代表他重複登記的資訊佔了大概80% ## 3.資料分析 ### 找到日期最大值與最小值 ```python # 將 'forecastdate' 欄位轉換為日期時間物件 df_unique['forecastdate'] = pd.to_datetime(df_unique['forecastdate']) # 找到最早和最晚的日期 earliest_date = df_unique['forecastdate'].min() latest_date = df_unique['forecastdate'].max() print("最早日期:", earliest_date.strftime('%Y/%m/%d')) print("最晚日期:", latest_date.strftime('%Y/%m/%d')) ``` - 找到資料中的"預測時間最大與最小值",有助於我們等等做圖表 - 先將日期轉成時間物件,後找到**在forecastdata欄位**中的最大與最小值 - 並利用strftime('%Y/%m/%d')顯示 ### 資料更正*(Mapping/Replace)* - 將資料中的中文轉換成英文 ```python area_mapping = { '北部': 'North', '竹苗': 'Zhubei', '中部': 'Central', '雲嘉南':'YunChiayiTainan', '高屏': 'KaohsiungPingtung', '宜蘭': 'Yilan', '花東': 'HualienTaitung', '馬祖': 'Matsu', '金門': 'Kinmen', '澎湖': 'Penghu' } df_unique['area'] = df_unique['area'].replace(area_mapping) ``` ### 根據組別分組並取平均值 根據目標,我們希望找到"每個地區在時間區間內的平均 AQI 值" => 我們需要按 `area` 這一標籤將資料分組,並計算每個地區的平均 AQI 值。 => 這樣,我們可以了解每個地區的空氣品質在整個時間區間內的表現。 我們可以使用以下程式碼來實現這個目標: ```pyhton area_aqi = df_unique.groupby('area')['aqi'].mean() ``` >[!Note] >`groupby('area')`:這部分代碼將資料按照 `area` 列的值進行分組,將所有具有相同 `area` 標籤的資料集合在一起。 > `['aqi'].mean()`:對每個分組計算 `aqi` 欄位的平均值,這樣每個地區的平均 AQI 值就會被計算出來。 將其結果print出來,`print(area_aqi)` ![image](https://hackmd.io/_uploads/rJc0WBeoR.png) ## 4.資料視覺化 ### 製作圖表,利用剛剛的area_aqi展示結果 根據剛剛計算出來的結果,製作成圖表,我們利用**長條圖**來表示每個地區的AQI值 > area_aqi.plot(kind='barh') = 說明我們現在area_aqi要用長條圖來表示 > xlabel() = 設定X軸名稱 > ylabel() = 設定Y軸名稱 > title() = 設定圖表標題名稱 > show() = show出製作好的圖表 ```python area_aqi.plot(kind='barh') plt.xlabel('Avg AQI') plt.ylabel('Area') plt.title('Each Area AQI in ' + earliest_date.strftime('%Y/%m/%d') + ' to '+ latest_date.strftime('%Y/%m/%d')) plt.savefig('area_aqi_chart.png') plt.show() ``` 最後,將其show出來後,應該會長: ![image](https://hackmd.io/_uploads/SkzoMrgoR.png) ### 換你來試看看 如果我們現在想要完成"根據時間分類,評判**各時段下所有地區的平均AQI數值**,並以**折線圖**表示" 應該怎麼做呢? >[!Tip]提示 > 1. 我們如果要完成上述目標,我們應該要將==日期相同的人==做AQI平均值,那該怎麼讓同日期的人在同一組? > 2. XXX.plot(kind='line') = 宣告你的dataframe以折線圖表示? XY軸名稱如何宣告? ## 補充:如果你想要繼續玩下去 ### Kaggle 簡介與入門 Kaggle 是一個為資料科學和機器學習愛好者設計的平台,提供以下功能與資源 - 擁有大量公開的資料集,涵蓋各種主題:電影評論、電子商務、醫療數據等。 可直接在 Kaggle Notebook 中下載與處理資料,無需本地環境設置。 競賽 - 提供各式機器學習競賽,從初學者到進階者都有適合的挑戰。常見主題包括分類、迴歸、圖像處理和自然語言處理。 - 大量學習資源 ### 如果你想要在Colab中使用Kaggle資料集 在 Google Colab 中使用 Kaggle 資料集,可以**直接將資料集下載到 Colab 的虛擬機器中**,避免使用本地端資源。 而要做到這件事情,需要Kaggle API認證才能進行: ![image](https://hackmd.io/_uploads/rydbUmsfJx.png) 到此畫面,點擊Setting ![image](https://hackmd.io/_uploads/BkFmLQoGyl.png) 到 Setting 中點擊 Create New Token 就可以拿到了 ![image](https://hackmd.io/_uploads/S1lfqU7sG1g.png) 在這樣的程式下,程式會要求你上傳檔案,將剛剛下載到的.json(即Kaggle API)上傳,此空間就能夠使用Kaggle資料集了! ![image](https://hackmd.io/_uploads/rJujvXoMJl.png) 如果你要下載想要的Dataset,就將Kaggle給予的地址複製到自己的空間中,驅動程式後就會下載 ![image](https://hackmd.io/_uploads/SJncj7oMkl.png) 之後就跟前面所述一樣就可以進行資料分析囉

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully