Try   HackMD

移除試算表中重複的資料(使用 LibreOffice Calc、Google Sheets、Microsoft Excel)

作者:王一哲
日期:2021/10/23

前言

當我們使用試算表軟體處理大量的資料時,如果要手動找出並移除重複的資料,不但要花費很多的時間,甚至有可能會漏掉一些重複的資料。其實在 LibreOffice Calc、Google Sheets、Microsoft Excel 都有內建的工具可以處理,以下是這些工具的使用方法。

LibreOffice Calc

測試的作業系統為 Linux Mint 20.1 Ulyssa,LibreOffice 版本為 7.1.5.2,測試用的資料是從環保署環境資料開放平臺下載的土城空氣品質測站一氧化碳濃度月值資料。下圖是將資料以 csv 檔格式匯入後的樣子。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
將資料𠥔入 LibreOffice Calc

使用預設的篩選工具

於工具列中依序點選資料更多篩選標準篩選

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

資料 ⇒ 更多篩選 ⇒ 標準篩選

於標準篩選視窗中,不要指定欄位名稱,勾選無重複複製結果到保留篩選準則,選取存放篩選後資料的儲存格,下圖中選取的是同一個分頁中的 J1 儲存格,最後點選右下角的確定

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

標準篩選視窗

下圖是移除重複資料的成果,由於原始資料中 MonitorMonth 202106 有三筆資料,而且在這三筆資料中 Concertration 欄位的數值有 0.23、0.24 兩個不同的值,因此這兩筆資料會同時被保留下來。如果想要依據指定的欄位移除重複的資料,需要安裝擴充套件才行。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

移除重複資料的成果

使用擴充套件的篩選工具

目前在 LibreOffice 網站上搜尋到的擴充套件有兩個,分別是 Remove DuplicatesRemove Duplicates Fast,我測試過兩個擴充套件之後發現,它們好像是同一個擴充套件的不同版本,而不是獨立開發的兩個元件。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Remove Duplicates

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Remove Duplicates Fast

以下採用 Remove Duplicates 1.0.6。下載擴充套件後使用 LibreOffice 開啟檔案進到安裝畫面按下確定

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
安裝擴充套件畫面

授權協議,按接受繼續安裝。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

授權協議

安裝擴充套件之後需要重新啟動 LibreOffice,但是重新啟動又會再問一次是否要安裝擴充套件,這次要按取消,不然會再跑一次上面的步驟。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

安裝擴充套件完成

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
重新啟動 LibreOffice

再開啟先前的試算表,此時在資料選單裡會多一項刪除重複項,畫面左側會多一個對應的按鈕。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

資料 ⇒ 刪除重複項

選取 A 到 H 欄的資料,再點選刪除重複項工具,於下方的視窗中勾選選取區域包含了題名,再點選 MonitorMonth,最後點選確定

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

刪除重複項視窗

下圖是刪除重複執行的結果,刪除了16筆重複的資料。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

刪除重複結果

這次就只留下 MonitorMonth 欄位中 202106 第一列的資料。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

刪除重複資料後的試算表

如果想要將這個功能的按鈕固定在工具列中,可以從選單中依序點選檢視工具列自訂

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

檢視 ⇒ 工具列 ⇒ 自訂

類別中選選巨集,然後搜尋duplicate,點選可用指令中搜尋到的RemoveDuplicates,然後點向右的箭頭將它加到工具列中,勾選這個項目後按確定

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

自訂工具列視窗

在工具列的右上角出現了剛才新增的按鈕。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

新增後的按鈕

Google Sheets

先將資料匯入到 Google Sheets。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

匯入資料至 Google Sheets

選取 A 到 H 欄的資料,再從選單中依序點選資料移除重複內容

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

資料 ⇒ 移除重複內容

在移除重複內容視窗中勾選資料包含標題列,由於要分析的資料欄預設為全選,取項全選然後再勾選G 欄 (MonitorMonth),最後點選移除重複內容

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

移除重複內容視窗

共移除 16 筆重複的內容,留下 339 筆資料。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

移除重複內容成果

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
移除重複資料後的試算表

Microsoft Excel

測試的環境為 Windows 10 家用版、Microsoft 365 Apps 企業版。先資料料匯入 Excel。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

匯入資料至 Excel

選取 A 到 H 欄的資料,再從選單中依序點選資料移除重複項

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

資料 ⇒ 移除重複項

在移除重複項視窗中先取消全選,再勾選MonitorMonth,最後點選確定

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

移除重複項視窗

共移除 16 筆重複的內容,不包含標題列共留下 338 筆資料,。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

移除重複項成果

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
移除重複資料後的試算表

結語

因為我們在探究與實作的課程中有一個題目,要求學生從環保署的網站搜尋並整理空氣品質的資料,但是下載後的資料量很大,其中又有一些重複的資料。學生遇到這個狀況時,很直覺地開始手動尋找並刪除重複的資料,沒有想要先上網搜尋一下是否有更方便的作法,我只好當場示範一下這個強大的功能。希望這篇筆記能幫助到更多有需要的人。


tags:LibreOfficeGoogle SheetsExcel