Try   HackMD

第一週 大數據分析實務&商業智慧與巨量資料分析

載入文件、環境架設、快速複習Python 與 課堂規則說明

tags: 大數據分析實務 商業智慧與巨量資料分析 碩士 複習用 高科大
文章目錄

第一周內容

時間:2024/2/21(三)
2025/2/18(二)、2025/2/20(四)

說明

製作輿情分析網站
課堂每周作業任務、期中期末交網站報告

第一節課內容

介紹輿情分析網站,還有一些趨勢、跟可以獲取到甚麼新知。

會學習到

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Python 爬蟲 Django NLP(text2text)
Website Design(html+css+js) ML&DL MLP、CNN LLM api

課堂評分方式

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

規定一個人自己完成一個專案(可以互相討論、使用LLM工具)。

第二節課內容

Important

基礎Python快速review。
學習使用 jupyter notebook (.ipynb)

可以使用colab執行,但老師希望大家,直接下載老師的範本檔案,在本地端用vscode執行。

STEP
  1. 在C槽 文件 夾內,新增一個資料夾,名為bigdata
  2. 將老師範本檔案,放進資料夾內
  3. 使用VSCode開啟 (可以在終端機cmd 輸入code .開啟)

回家作業-環境架設

下載檔案

Note

選取 w01 所有檔案 download。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

如果 無法使用檔案連結已被老師變更,請至 Github倉庫 下載

Important

bigdata/2025/class1 at main · chiaoshin/bigdata | Github倉庫下載連結

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

軟體與環境安裝 STEP BY STEP

  1. 使用 VSCode編譯器
  2. 下載 Python 版本3.10.6(以上)
  3. 安裝 MiniConda3 | 可下載最新版
    後續需建立虛擬環境,可參考 文章後方MiniConda安裝教學
  4. 安裝 老師提供之 Python環境套件

最後,請開啟檔案

00-Python introduction-very simple version.ipynb

學習如何使用 python

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
注意 ! ! !
老師會在第一週教學python的基礎用法,但建議還是要有一定基礎,再來修課會比較合適。

點擊,檔案右上角 Select Kernel 選擇 Python的環境

Ex. ai25(Python 3.10.12)

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Note

如果你有其他環境,可以這樣切,就能使用其他專案之環境。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

開始執行 jieba,最終產生 Frequency count(計算關鍵字出現次數)

此為2024年老師教學之內容,約第3週才會碰到斷詞與詞彙分析,此檔案開啟方式可忽略。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Python版本控制

可以使用一款軟體 Scoop,去安裝要使用的多個版本的python,並用 scoop reset python版本
(Ex. scoop reset python310) 來切換版本,最後再用 python -V 查目前切換的版本。

# 搜尋 python 版本
scoop search python

# 安裝 python 3.10以上版本
scoop install python 310

# 如果有多個python版本,可下指令進行切換
scoop reset python310

# 最後查看目前python版本
python -v

參考文件:

額外推薦使用軟體

terminal

image

Windows Terminal - 在 Windows 上免費下載並安裝 | Microsoft Store 安裝載點

microsoft/terminal: The new Windows Terminal and the original Windows console host, all in the same place! | github官方開源介面與載點

老師安裝教學_Powershell usage and installation(Optional) - Google 雲端硬碟
profile.ps1 載點 | 自動斷行

參考文章:美化 Terminal |Windows - Windows Terminal - HackMD

Tip

可以使用 Scoop 安裝開源軟體。

image

MiniConda3

老師安裝教學_miniconda | Google 雲端硬碟
一樣使用 Scoop 安裝。

尋找是否有miniconda
scoop search miniconda

下載其中一個符合課堂所需的版本
scoop install miniconda3-4.12.0 

image

安裝完成,在windows搜尋則會找到 Anaconda Prompt
終端機介面(cmd、powershell)

image

利用 MiniConda,創建 虛擬環境

防止每個專案使用的套件,版本不會污染到本機的其他檔案。

conda env list 查看虛擬環境 conda create -n ai24 python=3.10 新增名為ai24的虛擬環境 conda activate ai24 啟動虛擬環境 conda deactivate 退出虛擬環境 conda env remove -n ai24 刪除名為ai24的虛擬環境 conda init powershell 初始化 pip list 確認環境的套件安裝

Note

今年為 2025 (113-2),則改為新增 ai25 虛擬環境

新增名為ai25的虛擬環境,並同時安裝python版本3.11
conda create -n ai25 python=3.11

啟動虛擬環境
conda activate ai25

Tip

在 PowerShell 反白選起來,等於 複製貼上,不用使用快捷鍵(ctrl+c、ctrl+v)處理。

安裝課堂必要 Python套件

10-10-requirement-2024.txt載點 | Python packages we have to install - Google 雲端硬碟

安裝所有套件環境
pip install -r 10-10-requirement-2024.txt

絕對路徑安裝(此為舊版安裝檔)
pip install -r D:\nkust\bigdata\10-10-requirements-2023

Note

相對路徑安裝
今年為 2025 (113-2),則改為 10-10-requirements-2024.txt 安裝檔。

cd D:\nkust\bigdata2025 自行新增資料夾,並切入路徑 pip install -r .\10-10-requirements-2024.txt 安裝所有套件環境

安全機制權限

調整PowerShell的權限,就能執行腳本。

Set-ExecutionPolicy RemoteSigned 設定 執行原則 Get-ExecutionPolicy 確認目前 執行原則

「執行原則」 有下列 4 種:

  1. Restricted :關閉腳本檔的執行功能,這是預設的設定值。
  2. AllSigned :只允許執行受信任發行者簽署過的腳本檔。
  3. RemoteSigned :在本機電腦所撰寫的腳本檔,不需要簽署就可執行;但是從網際網路(例如:email、MSN Messenger)下載的腳本檔就必須經過受信任發行者的簽署才能執行。
  4. Unrestricted :任何腳本檔皆可被執行,但是於執行網際網路下載的腳本檔時,會先出現警告的提示視窗。

image

下次執行 powershell 時,自動載入虛擬環境。

VSCode 載入 PowerShell

image

安裝其餘套件

繼續安裝延伸套件(Plugin):

  1. Python—by Microsoft (three extensions will be installed)
    -Pylance: A performant, feature-rich language server for Python in VS Code
    -Jupyter
    -Black Formatter Python formatter
  2. Live Server (local server for dynamic pages)
  3. Django by "Baptiste Darthenay" Django code formatter
  4. Prettier - Code formatter(Beautify javascript, JSON, CSS, Sass, and HTML in Visual Studio Code.)
  5. Auto Rename Tag by Jun Han(自動重命名配對的HTML / XML標籤)

到 VSCode 設定 自動存檔

AutoSave

  • File>AutoSave 自動存檔 勾選

執行檔案

開啟資料夾10-05(w02-35) 繁體中文斷詞
載入模型,進行斷句。


最後更新日期

第一版2024 3 27 , 7:30 PM

第二版2025 2 18 , 12:10 PM

第三版2025 2 20 , 11:39 PM

最後版2025 2 20 , 12:00 PM