wa.__.wa
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Pandas模組與資料型態 / Series / DataFrames 學習筆記 <!-- 前言:學習的是方法,之後再去查對應的語法 --> > [TOC] 課堂老師提供的學習方式: 閱讀官方API文件,以及查看資料型態以及方法 舉例: ``` print(type(df.loc[0,'bdy'])) print(dir(df.loc[0,'bdy'])) ``` :::success 台灣行列用語與大陸用法不同 查看網路資料須留意 直行橫列 (前列,後行) ::: --- 隨筆記錄區: pandas 最強函式: apply numpy 的小孩:pandas > 延伸筆記:[NUMPY Matplotlib 學習筆記](/YGK1t_tZSoOa8xHoTS2zGA) 獲得資料後讀取,產生一維或二維資料 資料格式: 1. CSV, Excel 可使用 Pandas 模組讀取 2. Text 文字檔案,使用 BIF 內建函式讀取 Pandas 專屬資料型態認識 學習如何新增/修改/刪除/排序/索引/切片 額外學習如何處理日期資料, what is 列索引物件 行索印物件 最後將整理好的資料視覺化 可以使用的工具: * PowerBI * matplotlib * seaborn * plotly * streamlit * tkinter 步驟:讀取檔案,轉成 pandas 可讀取形式,修改欄位名稱,查看資料型態 產生 DataFrames 的方式 字典 key 當行索引 或是多組列資料,指定行列索引 ![Xnip2024-08-23_10-41-39](https://hackmd.io/_uploads/rJ5DiOrjC.png) --- # Pandas Series 一維資料物件 第一種查看方式: 使用字典建立 Series 使用字典建立 Series 時,字典的鍵成為索引,值成為數據。 官方提供的案例: ```python import pandas as pd d = {'a': 1, 'b': 2, 'c': 3} ser = pd.Series(data=d, index=['a', 'b', 'c']) print(ser) ``` output : ``` a 1 b 2 c 3 dtype: int64 ``` 第二種查看方式: 將資料轉成 dataframe 後 `df.dtypes` 會回傳一個 Series 裡面儲存每個欄位的資料型態 ![Xnip2024-08-23_10-59-48](https://hackmd.io/_uploads/BJrVidSjC.png) ```python= import pandas as pd f = open("myscore.txt") list_all = f.readlines() # print(list_all) score = [] # 資料預處理成 pandas 可讀取形式 for item in list_all: a = item.split('\t') score.append(a) # print(a) # print(score) df = pd.DataFrame(score) df.columns = ["學號","姓名","國文","英文","數學","血型","生日"] print(type(df.dtypes)) print(df.dtypes) print(df) f.close() ``` # Series / dataframe 合併資料 參考資料: https://medium.com/@hui509/pandas-新增-series-或-dataframe-bb41c4098603 https://pandas.pydata.org/docs/reference/api/pandas.concat.html#pandas.concat 要注意 series 是 一維 ```python= New = pd.Series(['USA','Canada','UK']) # Solution1 - _append # New_countries = city._append(New,ignore_index=True) # 魔法函式 進階使用 先不用學 # Solution2 - concat New_countries = pd.concat([city,New],ignore_index=True) # ignore_index 重新索引避免重複,確保索引的連續性和唯一性 # 一維其實可以直接用串列形式,可以直接相加合併,無需使用 pd 模組也可 print(city) print(New_countries) ``` # 修改資料型態 astype https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype ```python= df.columns = ["學號","姓名","國文","英文","數學","血型","生日"] df = df.astype({"國文":"uint8"}) ``` 用 `df.info()` 查看表格結構 ``` <class 'pandas.core.frame.DataFrame'> RangeIndex: 29 entries, 0 to 28 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 學號 29 non-null object 1 姓名 29 non-null object 2 國文 29 non-null uint8 3 英文 29 non-null object 4 數學 29 non-null object 5 血型 29 non-null object 6 生日 29 non-null object dtypes: object(6), uint8(1) memory usage: 1.5+ KB None ``` 備註:他會回傳一個新的,要重新賦予 ![image](https://hackmd.io/_uploads/SyGFmYHiA.png) # 新增欄位/索引資料 > df["欄位"] = list(range(1,10)) > df.loc["索引"] = list(range(1,10)) ``` # numpy 自動廣播的概念 # numpy arrange df['平均'] = 0 # df['平均'] = list(range(1,30)) ``` 其中要特別注意的是 Dataframe: 新增 col 直行: df.loc[:,'直行'] 可以縮寫成 df['直行'] 新增 row 橫行: df.loc[1000] 不可以縮寫 ```python= data = {'name':['Larry','Lin'], 'bdy':['1130','1212']} df2 = pd.DataFrame(data, index=['1', '2']) # print(df2) df2.loc[3, :] = ['Wang', '0101'] print(df2) # 新增 col 'blood' df2['blood'] = 1000 # df2[:,'blood'] 是錯誤的,真正的用法為 df2.loc[:,'blood'] print(df2) # _append 不好用不要用,還要考慮 index ``` # concat 新增多筆資料 ``` data = {'name':['Larry','Lin'], 'bdy':['1130','1212']} df2 = pd.DataFrame(data) # print(df2) # 單筆資料 df2.loc[2] = ['Wang','0101'] print(df2) # df2['blood'] = 1000 # df2[:,'blood'] = 1000 # print(df2) # _append 不好用不要用,還要考慮 index # 多筆資料 data2 = {'blood':['A','B','AB']} df3 = pd.DataFrame(data2) new = pd.concat([df2,df3], axis=1) print(new) # axis=0 為直向合併 ``` # 刪除資料 x 沒記錄到 # 修改資料 用文字或是位置當作索引,選取位置 > .loc["行的索引值"] ```python= # 修改資料,索引與切片 # df['平均'] = 0 # df['平均'] = list(range(1,30)) df.loc[0,'平均'] = round(( df.loc[0,'ch'] + df.loc[0,'en'] + df.loc[0,'math'] ) / 3,2) # 用標籤選取資料 # 單行 column df_avg = df.loc[:,'avg'] # 可簡寫成 df_avg = df['avg'] ``` 老師說明要區分是處理 series 還是 dataframe 網路上都在教 dataframe 的例子 下列是使用 Series 的 round 函式 (dataframe一樣有) ```python= # 計算平均分並新增為新的一欄「平均」 df['avg'] = ( (df['ch'] + df['en'] + df['math']) / 3 ).round(2) ``` # 排序 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values series rank 用在排名,ascending=False ``` df['rank'] = df['avg'].rank(method='min',ascending=False).astype(int) df = df.sort_values(['rank']) ``` ``` import pandas as pd # 創建一個含有重複值的 Series data = pd.Series([70, 90, 70, 80, 90, 70]) # 使用不同的 method 進行排名 rank_average = data.rank(method='average', ascending=False) # 平均排名 rank_min = data.rank(method='min', ascending=False) # 最低排名 rank_max = data.rank(method='max', ascending=False) # 最高排名 # 輸出結果 print("Original Data:\n", data) print("Rank by Average:\n", rank_average) print("Rank by Min:\n", rank_min) print("Rank by Max:\n", rank_max) ``` dataframe rank https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html#pandas.DataFrame.rank # series strip https://pandas.pydata.org/docs/reference/api/pandas.Series.str.strip.html#pandas.Series.str.strip # 日期處理 時間轉字串 str from time https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.strftime.html#pandas.DatetimeIndex.strftime Function is not implemented. Use pd.to_datetime(). str phase time https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.strptime.html#pandas.Timestamp.strptime ![image](https://hackmd.io/_uploads/Byn3CsBiA.png) 民國年要改成西元年 使用 apply 函式,去寫自定義 func ```python= # Pandas BIF # df['bdy'] = pd.to_datetime() # 正常用法:轉換日期,並調整年份格式 # df['bdy'] = pd.to_datetime(df['bdy'], errors='coerce', format='%y-%m-%d') # 民國轉西元 用迴圈方式 # for index, value in enumerate(df['bdy']): # year, month, day = value.split("-") # 分割字符串得到年、月、日 # year = int(year) + 1911 # 將民國年轉換為公元年 # new_date = f"{year}-{month}-{day}" # 格式化新的日期字符串 # df.loc[index, 'bdy'] = new_date # 使用 index 更新 df # # df['bdy'] = pd.to_datetime(df['bdy']) def right_date(value): year, month, day = value.split("-") year = int(year) + 1911 return f"{year}-{month}-{day}" # 將民國年轉換為西元年 df['bdy'] = df['bdy'].apply(right_date) # 然後將這些字符串轉換為 datetime df['bdy'] = pd.to_datetime(df['bdy']) ``` pandas 有 timestamp 物件 https://timestamp.onlinealat.com https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.year.html pandas series 有 year / month / day https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.year.html# ## pandas.to_datetime https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#pandas-to-datetime # DataFrame 二維資料物件 直行 columns 橫列 index 與前面一維 Series 互相比較 (未騰入筆記) # 學習資源: (大部分都是)用 dataframe 舉例的教學: https://ithelp.ithome.com.tw/articles/10343746 用實際案例舉例如何操作,製圖讚讚,一系列資料科學相關文章 https://medium.com/ntu-data-analytics-club/python-advanced-一起來用-pandas-玩轉數據-6d06d805941a Day10Learning Pandas - Series、DataFrame、Index https://ithelp.ithome.com.tw/articles/10204656 智能股市分析師:Python股價歷史資料自動化爬蟲與分析 https://hackmd.io/@Tj7zBL4CS-CymiaJeR8Uew/SkZ8TFvB6

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully