Python 入門教學-Python自動化的樂趣

--- title: Python 入門教學-Python自動化的樂趣 tags: python_beginner --- > [TOC] > # Python自動化的樂趣 :::info 參考書: [Python 自動化的樂趣：搞定重複瑣碎&單調無聊的工作](https://www.books.com.tw/products/0010739372) ::: ![](https://i.imgur.com/ddFMtyn.png) ## Lab1: 簡易密碼管理器 :::info 套件: pyperclip pip install pyperclip ::: * 什麼是密碼管理器一般我們在使用密碼的習慣是一組密碼打天下，但是一旦這些網站中的任何一個出現漏洞，駭客就可以拿著這組密碼進入你所有的帳戶，十分的不安全。比較好的做法是，每個網站使用不同密碼，並用一個密碼管理器來記錄這些密碼，要登入時，只要輸入密碼管理器的主密碼進入管理器當中把密碼複製出來就行了。這邊用python實作一個相同原理的小程式，但不建議你們直接拿來用，因為十分的不安全。但同樣的原理，其實可以那來當作常用語句的紀錄本，假設你有一些很常會用的到的句子，然後打起來很麻煩，就可以記錄在這裏面，要用到的時候把它複製出來就行了。 * 程式碼 ```python= #! python3 PASSWORDS = {'email': 'FZ@2213414bc%$sdf', 'blog': 'fjs&12ACjef@123', 'luggage': '12345'} import sys if len(sys.argv) < 2: print('Usage: python pw.py [account]') sys.exit() account = sys.argv[1] import pyperclip if account in PASSWORDS: pyperclip.copy(PASSWORDS[account]) print('Password for ' + account + ' copied to clipboard.') else: print('There is no account named ' + account) ``` * 如何快速執行 * 新建一個資料夾存放你所有的scripts，並將這個資料夾的路徑加入path環境變數 * 每個python script都加上一個.bat檔 ```bash= @py.exe C:\path\to\your\scripts\pw.py %* @pause ``` * 按下「視窗鍵+R」並輸入script的名稱就可以直接執行python檔 ## Lab2: 在每一行字前面加上bullet point :::info 套件: pyperclip pip install pyperclip ::: 假設有一段文字如下，存在你的剪貼簿中 ``` Books are for use. Every person his or her book. Every book its reader. Save the time of the reader. The library is a growing organism. ``` 你想要在每行文字前加上一個\*的符號，可以利用一個python script來完成，這個script會做這些事情 1. 從剪貼簿貼上文字到python裡面 2. 對它進行加工 3. 將新的文字複製到剪貼簿程式碼: ```python= #! python3 import pyperclip text = pyperclip.paste() lines = text.split('\n') for i in range(len(lines)): lines[i] = '* ' + lines[i] text = '\n'.join(lines) pyperclip.copy(text) ``` * 小練習: 如果我今天想要的不是加上bullet point，而是加上1, 2, 3這種數字編碼，要怎麼做? ## Lab3: 擷取所有Email 假如今天你老闆要你從某篇文章中把文章內所有的email都複製出來，這是一件十分累人的工作，但如果用以下的方式，只需要ctrl+A全選，ctrl+C複製，然後執行這個腳本，就可以自動將文章內所有email給複製到剪貼簿中了。 ### Regular Expression(正規表達式) 要做到這件事情，最困難的部分就是要怎麼知道某串文字是否為email，我們可以使用Regular Expression的方式將email過濾出來。在這之前，可以先想想如果是你的話，要怎麼寫出一個程式從某段文章當中找出室內電話(e.g. 02-2212-1234) 首先要確認是否為電話號碼的話可以這樣寫: ```python= def isPhoneNum(text): if len(text) != 12: return False for i in range(0, 2): if not text[i].isdecimal(): return False if text[2] != '-': return False for i in range(3, 7): if not text[i].isdecimal(): return False if text[7] != '-': return False for i in range(8, 12): if not text[i].isdecimal(): return False return True ``` 然後從一段文章中抓取出電話號碼 * 文章: ``` Call me at 02-1234-1234 tomorrow. 02-4321-4321 is my office. ``` * 程式碼: ```python= message = 'Call me at 02-1234-1234 tomorrow. 02-4321-4321 is my office.' for i in range(len(message)): chunk = message[i:i+12] if isPhoneNum(chunk): print('Phone number found: ' + chunk) print('Done') ``` 十分麻煩且瑣碎對吧，這還只是找電話號碼而已，僅僅只是數字加上dash的組合，想想如果是email呢? 地址呢? ISBN? 索書號? 想想就頭痛。 * 什麼是Regular Expression 在講解什麼是Regular Expression(以下簡稱Regex)前，先看看怎麼在Python當中使用Regex過濾出室內電話 ```python import re phoneNumRegex = re.compile(r'\d\d-\d\d\d\d-\d\d\d\d') num = phoneNumRegex.search('My number is 02-1234-1234.') print(num.group()) ``` 如此簡單。 Regex簡單來說，就是用某個特定的符號來代表某類型的字，例如\d這個符號代表的就是數字(decimal)，`\d\d-\d\d\d\d-\d\d\d\d`的意思就是說，要找的東西的格式是`數字數字 - 數字數字數字數字 - 數字數字數字數字` * 分組在python中使用Regex要先了解分組的概念，以上面那段程式碼為例 ```python import re phoneNumRegex = re.compile(r'(\d\d)-(\d\d\d\d-\d\d\d\d)') num = phoneNumRegex.search('My number is 02-1234-1234.') print(num.group(1)) print(num.group(2)) print(num.group(0)) # 0或是不打任何數字，就是將所有結果印出 print(num.group()) ``` 注意，我在Regex裡面加上了括號，每個括號是一個分組，可以透過.group()方法將特定分組取出 * 問號利用問號可以達到可選擇性比對舉例而言，我想要在某篇文章中找出Superman或是Superwoman，可以發現wo是沒有出現或出現一次，所以程式碼會這樣寫 ```python= import re superRegex = re.compile(r'Super(wo)?man') result1 = superRegex.search('The Adventures of Superman.') print(result1.group()) result2 = superRegex.search('The Adventures of Superwoman.') print(result2.group()) ``` 這邊可以看到我將wo做成一個分組，這個分組後面加上問號，代表它是可有可無的 * 星號利用星號可以比對零次或多次一樣使用這面的例子，只是將問號改成星號 ```python= import re superRegex = re.compile(r'Super(wo)*man') result1 = superRegex.search('The Adventures of Superman.') print(result1.group()) result2 = superRegex.search('The Adventures of Superwoman.') print(result2.group()) result3 = superRegex.search('The Adventures of Superwowowowowowoman.') print(result3.group()) ``` * 加號利用加號可比對一次或多次這邊觀察result1，因為內文沒有出現一次以上的wo，所以search不到結果，就會回傳None ```python= import re superRegex = re.compile(r'Super(wo)+man') result1 = superRegex.search('The Adventures of Superman.') print(result1 == None) result2 = superRegex.search('The Adventures of Superwoman.') print(result2.group()) result3 = superRegex.search('The Adventures of Superwowowowowowoman.') print(result3.group()) ``` * 大括號使用大括號可以指定特定次數 e.g. HAHAHA => (HA){3} 也可用{a,b}的方式來找出a次以上b次以下的 {a,}代表a次以上 {,b}代表b次以下 :::warning 注意: 逗號後不可空格 ::: ```python= import re haRegex = re.compile(r'(HA){,3}') result1 = haRegex.search('HAHAHA') print(result1.group()) result2 = haRegex.search('HA') print(result2.group()) result3 = haRegex.search('HAHAHAHA') print(result3.group()) ``` 注意result3只會抓出3個HA * 使用findall找出所有符合的字串以上面的室內電話為例，要找出文章中的兩個電話 ```python= import re phoneNumRegex = re.compile(r'\d\d-\d\d\d\d-\d\d\d\d') nums = phoneNumRegex.findall('Call me at 02-1234-1234 tomorrow. 02-4321-4321 is my office.') print(nums) ``` * 字元分類 * \d: 0~9 * \D: 除了0~9以外任何字 * \w: 任何字母、數字或底線 * \W: 除了字母、數字或底線以外任何字 * \s: 空格、定位符號或換行符號 * \S: 除了空格、定位符號或換行符號以外任何字 * 建立自己的分類利用中括號可建立自己的分類可以用[0-5]這樣的方式代表0~5中間任一數字，也可以[a-e]、[a-zA-Z0-9]、[aeiouAEIOU]諸如此類在開頭插入^代表的是除了中括號裡面的字的所有字 e.g. [^aeiouAEIOU]會找出所有不是母音的字 * ^以及$ ^放在開頭代表已...開頭的字串，不要跟放在中括號內的^搞混，兩個的功能不同 ```python= import re beginWithHello = re.compile(r'^Hello') result1 = beginWithHello.search('Hello world!') print(result1.group()) result2 = beginWithHello.search('He said hello.') print(result2 == None) ``` $代表以...結尾，舉例而言，想找到以數字結尾的字串 ```python= import re endWithNum = re.compile(r'\d$') result1 = endWithNum.search('Your number is 77') print(result1.group()) result2 = endWithNum.search('Your number is seventy seven') print(result2 == None) ``` 假設要找到以數字開頭，並以數字結尾的字串可以這樣寫 ```python= import re wholeStrIsNum = re.compile(r'^\d+$') result1 = wholeStrIsNum.search('123456789') print(result1.group()) result2 = wholeStrIsNum.search('12345xyz6789') print(result2 == None) ``` * 句點句點可以當作萬用符號使用例如要找到什麼什麼at(e.g. cat, hat, bat...) ```python= import re atRegex = re.compile(r'.at') r = atRegex.findall('The cat in the hat sat on the flat mat.') print(r) ``` 這時候就可以提到常常見到的點星符號(.\*)，這個代表抓取所有字元直到不能抓為止，看個範例 ```python= import re nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)') r = nameRegex.search('First Name: Yang Last Name: Sam') print(r.group()) ``` ### 擷取email的程式 :::info 套件: re, pyperclip ::: * email的Regex ```python= import re import pyperclip emailRegex = re.compile(r'''( [a-zA-Z0-9._%+-]+ # username @ # @ symbol [a-zA-Z0-9-]+ # domain name (\.[a-zA-Z]{2,4})+ # dot-something )''', re.VERBOSE) ``` 這種寫法是為了讓Regex不要全部擠在同一行，提升可讀性 * 於剪貼簿中找到所有符合的字串 ```python= text = str(pyperclip.paste()) matches = [] for groups in emailRegex.findall(text): matches.append(groups[0]) ``` * 將所有找到的內容連接成一個字串給剪貼簿 ```python= if len(matches) > 0: pyperclip.copy('\n'.join(matches)) print('Copied to clipboard') else: print('No email addresses found.') ``` 以[輔大圖資系專任老師網頁](http://web.lins.fju.edu.tw/chi/full-time)為例，抓取結果如下: ``` cclee@blue.lins.fju.edu.tw lins1005@mail.fju.edu.tw lins1022@mail.fju.edu.tw yuanho@blue.lins.fju.edu.tw ryanchen@lins.fju.edu.tw 080677@mail.fju.edu.tw 084361@mail.fju.edu.tw scchen@blue.lins.fju.edu.tw 141646@mail.fju.edu.tw office@blue.lins.fju.edu.tw ``` ## HW1: 身分證隱藏將文章中的身分證字號替換成`F******789`，保護個人資料範例: 我是XXX，身份證字號為F123456789 替換為: 我是XXX，身份證字號為`F******789` ## Lab4: 多重剪貼簿 ### 介紹一般的剪貼簿只能複製一串文字，如果有很多串不同的文字就無法處理了，這個script可以讓使用者透過關鍵字記憶所需複製之文字，要用到時透過關鍵字將文字叫出來貼上即可 ### 使用範例 ```bash= mcb save phonenum # 將目前剪貼簿中的文字以phonenum關鍵字存取 mcb save email # 將目前剪貼簿中的文字以email關鍵字存取 mcb phonenum # 將phonenum叫回剪貼簿中 mcb list # 將所有關鍵字複製到剪貼簿中 ``` ### shelve模組 shelve模組可將python裡面的變數儲存到shelf檔案中，在windows會產生3個檔案，.bak、.dat、.dir，MAC只會建立一個.db檔案。 #### 用法寫入、讀取、刪除、修改shelve當中的值的方法就跟操作dictionary時類似，都是使用key, value的方式 * 寫入shelve ```python= # shelve_write.py import shelve shelfFile = shelve.open('mydata') # 開啟shelve檔案 cats = ['Zophie', 'Pooka', 'Simon'] # 建立貓名字的串列 shelfFile['cats'] = cats # 將該串列存到shelve當中 shelfFile.close() # 關閉shelve檔案 ``` * 讀取shelve ```python= # shelve_read.py import shelve shelfFile = shelve.open('mydata') # 開啟shelve檔案 print(shelfFile['cats']) shelfFile.close() # 關閉shelve檔案 ``` * 刪除shelve當中的值 ```python= # shelve_delete.py import shelve shelfFile = shelve.open('mydata') # 開啟shelve檔案 del shelfFile['cats'] shelfFile.close() # 關閉shelve檔案 ``` ### 程式要做的事情使用者輸入: `mcb [引數] [關鍵字]` * 檢查使用者有無在引數中加入關鍵字 * 如果引數是save那就將剪貼簿的內容存到關鍵字中 * 如果引數是list那就將所有關鍵字複製到剪貼簿中 * 如果引數不是save、list那就將關鍵字的內容複製到剪貼簿中整體來說程式碼要做到: * 從sys.argv讀取引數 * 讀寫剪貼簿 * 儲存並載入shelf檔 ### 實作 #### STEP1: 設定shelf ```python= import shelve, pyperclip, sys mcbShelf = shelve.open('mcb') # TODO: Save clipboard content. # TODO: List keywords, delete/load content. mcbShelf.close() ``` #### STEP2: 使用關鍵字來儲存剪貼簿內容 ```python= # --略-- # TODO: Save clipboard content. if len(sys.argv) == 3: if sys.argv[1].lower() == 'save': mcbShelf[sys.argv[2]] = pyperclip.paste() ``` #### STEP3: 列出與載入關鍵字 ```python= # --略-- # TODO: Save clipboard content. if len(sys.argv) == 3: if sys.argv[1].lower() == 'save': mcbShelf[sys.argv[2]] = pyperclip.paste() elif len(sys.argv) == 2: if sys.argv[1].lower() == 'list': pyperclip.copy(str(list(mcbShelf.keys()))) elif sys.argv[1] in mcbShelf: pyperclip.copy(mcbShelf[sys.argv[1]]) mcbShelf.close() ``` #### HW2-1: 加入Delete功能輸入`mcb delete [關鍵字]`可刪除特定關鍵字 #### HW2-2: list在終端機中列出所有關鍵字與其內容 > 提示: 想想怎麼將字典中的key, value全部列出 ## Lab5: 找尋資料夾中所有.txt檔中的Email ### STEP1: 找到所有的.txt檔案使用str.endswith("...")這個函式可以判斷該字串是否為...結尾 ```python= import re, os from collections import defaultdict # 待會解釋這個東西的功能 path = os.getcwd() # os.getcwd()會取得current working directory txt_files = [] for file in os.listdir(path): if file.endswith(".txt"): txt_files.append(file) ``` ### defaultdict Python的collections套件中提供了許多好用的資料結構，defaultdict便是其中之一 defaultdict好用的地方在於，當你想要把value設為一個list時，不須先宣告該key的value為一個list，便可直接append東西進去。舉例而言，這裡使用原本的dict: ```python= d = {} cats = ['Meoww', 'White', 'Fluffy'] d['cats'].append(cats) ``` 出錯了，python找不到叫做cats的key。要做到上面這件事，需要先宣告'cats'這個key為一個空的list，現在看來沒甚麼，但是在某些使用場合這樣用會有點麻煩。 ```python= d = {} d['cats'] = [] cats = ['Meoww', 'White', 'Fluffy'] d['cats'].append(cats) ``` 接著我們用defaultdict ```python= from collections import defaultdict d = defaultdict(list) cats = ['Meoww', 'White', 'Fluffy'] d['cats'].append(cats) ``` 不須宣告key為一個list，當我們使用這個key時，python會自行幫我們為這個key建立一個預設的value為空的list ### 讀檔這邊將檔案開啟，並把檔案中的所有文字串在一起變成一個字串，丟給regex處理 ```python= matches = defaultdict(list) for file in txt_files: f = open(file, 'r', encoding='utf8') text = "".join(f.readlines()) for groups in emailRegex.findall(text): matches[file].append(groups[0]) f.close() ``` ### 完整程式碼 ```python= # 這個腳本會在資料夾裡的txt檔中，找到所有Email，並印在終端機上 # Usage: 切換到目標資料夾後執行 python txtFinder import re, os from collections import defaultdict emailRegex = re.compile(r'''( [a-zA-Z0-9._%+-]+ # username @ # @ symbol [a-zA-Z0-9-]+ # domain name (\.[a-zA-Z]{2,4})+ # dot-something )''', re.VERBOSE) path = os.getcwd() txt_files = [] for file in os.listdir(path): if file.endswith(".txt"): txt_files.append(file) matches = defaultdict(list) for file in txt_files: f = open(file, 'r', encoding='utf8') text = "".join(f.readlines()) for groups in emailRegex.findall(text): matches[file].append(groups[0]) f.close() for key, value in matches.items(): print('Email(s) in "' + key + '":') for email in value: print(email) print() ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.