直接使用先前爬出來的多篇類別新聞(news_preprocessed.csv)、進階查詢-自訂關鍵詞熱門度分析、昨日(或今日)誰最大
大數據分析實務
商業智慧與巨量資料分析
碩士
複習用
高科大
大數據分析 - Google 雲端硬碟 | 資料夾word文件檔,附上老師更新連結
113-2 (2025) 記錄連結(開學教授會更新,連結會失效)
https://drive.google.com/drive/folders/1-6VP2R_Fta5PLaSfIgPLPfH9c3Bpo2Yw
商業智慧與巨量資料分析 - Google 雲端硬碟 | 資料夾word文件檔,附上老師更新連結
113-2 (2025) 記錄連結(開學教授會更新,連結會失效)
https://drive.google.com/drive/folders/1-8lLHQD9W-yb2Zd0Pk0Yw56ud7zgCamo?usp=sharing
時間:2025/3/25(二)、2025/3/27(四)
自訂關鍵詞熱門度分析
頁面,並將 news_preprocessed
匯入 dataset
昨日(或今日)誰最大
,根據 熱門人物關鍵字分析
加上 時間過濾 的功能,過濾 只保留最後一天 的新聞"hot persons of yesterday.csv"
Note
選取 w06 所有檔案 download。
如果 無法使用 或 檔案連結已被老師變更,請至 Github倉庫
下載
keyword query using your dataset
將 seetings.py
中的 'app_user_keyword',
導入。
settings.py
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
# 'corsheaders', #跨站請求
'app_top_keyword',
'app_top_person',
'app_user_keyword',
]
再將 path('userkeyword/', include('app_user_keyword.urls')),
路徑包含進去。
website_configs
urls.py
from xml.etree.ElementInclude import include
from django.contrib import admin
from django.urls import path,include
urlpatterns = [
# path('admin/', admin.site.urls),
# top keywords
path('topword/', include('app_top_keyword.urls')),
# top persons
path('topperson/', include('app_top_person.urls')),
# user keyword analysis
path('userkeyword/', include('app_user_keyword.urls')),
]
前端模板引入此 {% url 'app_user_keyword:home' %}
前端頁面。
templates
base.html
<!-- 進階自訂分析 -->
<div class="btn-group">
<button type="button" class="btn dropdown-toggle" data-bs-toggle="dropdown" aria-expanded="false">進階查詢</button>
<div class="dropdown-menu">
<a class="dropdown-item" href="{% url 'app_user_keyword:home' %}">自訂關鍵詞熱門度分析</a>
<a class="dropdown-item" href="#">自訂全文檢索與關聯分析</a>
<a class="dropdown-item" href="#">自訂關鍵詞之情緒分析</a>
</div>
</div>
再將 指定頁面(home.html) 及 dataset API 路徑導入。
app_user_keyword
urls.py
from django.urls import path
from app_user_keyword import views
app_name="app_user_keyword"
urlpatterns = [
# the first way:
path('', views.home, name='home'),
path('api_get_top_userkey/', views.api_get_top_userkey),
# the second way:
#path('top_userkey/', views.home, name='home'),
#path('top_userkey/api_get_top_userkey/', views.api_get_top_userkey),
]
'''
# the first way:
The url path on the browser will be
http://localhost:8000/userkeyword/
# the second way:
The url path on the browser will be
http://localhost:8000/userkeyword/top_userkey/
The ajax url is as the following:
$.ajax({
type: "POST",
url: "api_get_top_userkey/",
'''
彙整整個 塞選關鍵字
的函數。
詳細函數說明,可以研究老師的這三個檔案:
views.py
# Create your views here.
from django.shortcuts import render
import pandas as pd
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
from datetime import datetime, timedelta
# (1) we can load data using read_csv()
# global variable
# df = pd.read_csv('dataset/news_dataset_preprocessed_for_django.csv', sep='|')
# (2) we can load data using reload_df_data() function
def load_df_data():
# df is a global variable
global df
df = pd.read_csv('app_user_keyword/dataset/blow_news_preprocessed.csv', sep=',')
# We should reload df when necessary
load_df_data()
# hoem page
def home(request):
return render(request, 'app_user_keyword/home.html')
# When POST is used, make this function be exempted from the csrf
@csrf_exempt
def api_get_top_userkey(request):
# (1) get keywords, category, condition, and weeks passed from frontend
userkey = request.POST.get('userkey')
cate = request.POST.get('cate')
cond = request.POST.get('cond')
weeks = int(request.POST.get('weeks'))
key = userkey.split()
# (2) make df_query global, so it can be used by other functions
global df_query
# (3) filter dataframe
df_query = filter_dataFrame(key, cond, cate,weeks)
#print(len(df_query))
# (4) get frequency data
key_freq_cat, key_occurrence_cat = count_keyword(df_query, key)
print(key_occurrence_cat)
# (5) get line chart data
# key_time_freq = [
# '{"x": "2019-03-07", "y": 2}',
# '{"x": "2019-03-08", "y": 2}',
# '{"x": "2019-03-09", "y": 13}']
key_time_freq = get_keyword_time_based_freq(df_query)
# (6) response all data to frontend home page
response = {
'key_occurrence_cat': key_occurrence_cat,
'key_freq_cat': key_freq_cat,
'key_time_freq': key_time_freq, }
return JsonResponse(response)
def filter_dataFrame(user_keywords, cond, cate, weeks):
# end date: the date of the latest record of news
end_date = df.date.max()
# start date
start_date = (datetime.strptime(end_date, '%Y-%m-%d').date() - timedelta(weeks=weeks)).strftime('%Y-%m-%d')
# proceed filtering
if (cate == "全部") & (cond == 'and'):
df_query = df[(df.date >= start_date) & (df.date <= end_date)
& df.content.apply(lambda text: all((qk in text) for qk in user_keywords))]
elif (cate == "全部") & (cond == 'or'):
df_query = df[(df['date'] >= start_date) & (df['date'] <= end_date)
& df.content.apply(lambda text: any((qk in text) for qk in user_keywords))]
elif (cond == 'and'):
df_query = df[(df.category == cate)
& (df.date >= start_date) & (df.date <= end_date)
& df.content.apply(lambda text: all((qk in text) for qk in user_keywords))]
elif (cond == 'or'):
df_query = df[(df.category == cate)
& (df['date'] >= start_date) & (df['date'] <= end_date)
& df.content.apply(lambda text: any((qk in text) for qk in user_keywords))]
return df_query
# ** How many pieces of news were the keyword(s) mentioned in?
# ** How many times were the keyword(s) mentioned?
# For the df_query, count the occurence and frequency for every category:
# (1) cate_occurence={} number of pieces containing the keywords
# (2) cate_freq={} number of times the keywords were mentioned
news_categories = ['全部','人物', '議題', '新聞', '雜吹']
def count_keyword(query_df, user_keywords):
cate_occurence={}
cate_freq={}
for cate in news_categories:
cate_occurence[cate]=0
cate_freq[cate]=0
for idx, row in query_df.iterrows():
# count number of news
cate_occurence[row.category] += 1
cate_occurence['全部'] += 1
# count user keyword frequency by checking every word in tokens_v2
tokens = eval(row.tokens_v2)
freq = len([word for word in tokens if (word in user_keywords)])
cate_freq[row.category] += freq
cate_freq['全部'] += freq
return cate_freq, cate_occurence
def get_keyword_time_based_freq(df_query):
date_samples = df_query.date
query_freq = pd.DataFrame({'date_index': pd.to_datetime(date_samples), 'freq': [1 for _ in range(len(df_query))]})
data = query_freq.groupby(pd.Grouper(key='date_index', freq='D')).sum()
time_data = []
for i, idx in enumerate(data.index):
row = {'x': idx.strftime('%Y-%m-%d'), 'y': int(data.iloc[i].freq)}
time_data.append(row)
return time_data
print("app_user_keyword was loaded!")
前端畫面插入,並且修正對應資料集選項(新聞類別)、預設塞選的關鍵字 value="臺灣 台東"
。
home.html
{% extends 'base.html' %} {% block title %}
使用者關鍵詞查詢
{% endblock %} {% block content %}
<div class="col-lg-12">
<h1>分析你關心的關鍵詞</h1>
<p>可以針對你輸入的個別關鍵詞進行熱門程度分析</p>
</div>
<div class="col-lg-6 mb-2">
<!-- 輸入條件區塊開始 -->
<div class="card">
<div class="card-header">
<h3 class="h6 text-uppercase mb-0">輸入條件</h3>
</div>
<div class="card-body">
<div class="mb-3 row">
<label class="col-md-3 col-form-label">關心哪個關鍵詞?</label>
<div class="col-md-9">
<input id="input_keyword" name="userkey" value="臺灣 台東" class="form-control" />
<div class="form-text text-muted">查找關鍵字,可輸入多個,空白隔開。主要以人名,產品,地理區域為主(搜尋斷詞後的詞語,並非全文搜尋)。</div>
</div>
</div>
<div class="mb-3 row">
<label class="col-sm-3 col-form-label">條件</label>
<div class="col-md-9 mb-3">
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="cond_and" value="and" name="condradio" />
<label class="form-check-label" for="cond_and">and</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="cond_or" value="or" name="condradio" checked />
<label class="form-check-label" for="cond_or">or</label>
</div>
</div>
</div>
<div class="mb-3 row">
<label class="col-sm-3 col-form-label">新聞類別</label>
<div class="col-md-9">
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="cate_all" value="全部" name="cateradio" checked />
<label class="form-check-label" for="cate_all">全部</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="cate_politics" value="人物" name="cateradio" />
<label class="form-check-label" for="cate_politics">人物</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="cate_tech" value="議題" name="cateradio" />
<label class="form-check-label" for="cate_tech">議題</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="cate_sports" value="新聞" name="cateradio" />
<label class="form-check-label" for="cate_sports">新聞</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="cate_stocks" value="雜吹" name="cateradio" />
<label class="form-check-label" for="cate_stocks">雜吹</label>
</div>
</div>
</div>
<div class="mb-3 row">
<label class="col-md-3 col-form-label">最近多少周?</label>
<div class="col-md-9">
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="wk1" value="1" name="wkradio" />
<label class="form-check-label" for="wk1">1</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="wk2" value="2" name="wkradio" checked />
<label class="form-check-label" for="wk2">2</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="wk3" value="3" name="wkradio" />
<label class="form-check-label" for="wk3">3</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="wk4" value="4" name="wkradio" />
<label class="form-check-label" for="wk4">4</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="wk6" value="6" name="wkradio" />
<label class="form-check-label" for="wk6">6</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="wk8" value="8" name="wkradio" />
<label class="form-check-label" for="wk8">8</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" id="wk12" value="12" name="wkradio" />
<label class="form-check-label" for="wk12">12</label>
</div>
<div class="form-text text-muted">以最新資料時間為準,往前推多少周?</div>
</div>
</div>
<div class="mb-3 row">
<div class="col-md-9 ms-auto">
<button type="button" id="btn_ok" class="btn btn-primary">查詢</button>
</div>
</div>
</div>
</div>
</div>
<!-- 輸入區塊結束 -->
<!-- 顯示區塊 -->
<div class="col-lg-6 mb-2">
<div class="card">
<div class="card-header">
<h3 class="h6 text-uppercase mb-0">出現頻率以時間呈現</h3>
</div>
<div class="card-body">
<small>觀察每個時間點的有多少篇報導(聲量大小)</small>
<div class="row">
<canvas id="keyword_time_line_chart"></canvas>
</div>
</div>
</div>
</div>
<!-- 區塊結束 -->
<!-- 同時出現的關鍵字區塊 -->
<div class="col-lg-6 mb-2">
<div class="card">
<div class="card-header">
<h3 class="h6 text-uppercase mb-0">熱門程度:有幾篇新聞報導提到它?</h3>
</div>
<div class="card-body">
<ul id="keyword_article_count"></ul>
</div>
</div>
</div>
<!-- 區塊結束 -->
<!-- 熱門程度區塊 -->
<div class="col-lg-6 mb-2">
<div class="card">
<div class="card-header">
<h3 class="h6 text-uppercase mb-0">熱門程度:提到它的次數?</h3>
</div>
<div class="card-body">
<ul id="keyword_frequency"></ul>
</div>
</div>
</div>
<!-- 區塊結束 -->
{% endblock %} {% block extra_js %}
<!-- 這裡的java scrip等頁面初始化之後才載入與執行 -->
<!-- chartjs圖js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/moment.js/2.13.0/moment.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/2.7.3/Chart.min.js"></script>
<!-- 程式碼區 -->
<script>
call_ajax()
//**按鈕事件
$('#btn_ok').on('click', function () {
call_ajax()
}) //event function
$("input[name='cateradio']").on('change', function () {
call_ajax()
}) //event function
$("input[name='wkradio']").on('change', function () {
call_ajax()
}) //event function
$("input[name='condradio']").on('change', function () {
call_ajax()
}) //event function
function call_ajax() {
const userkey = $('#input_keyword').val()
const weeks = $("input[name='wkradio']:checked").val()
const cate = $("input[name='cateradio']:checked").val()
const cond = $("input[name='condradio']:checked").val()
if (userkey.length < 2) {
alert('輸入關鍵字不可空白或小於兩個中文字!')
return 0
}
$.ajax({
type: 'POST',
url: 'api_get_top_userkey/',
//url: 'http://163.18.23.20:8000/userkeyword/api_get_top_userkey/',
data: {
userkey: userkey,
cate: cate,
weeks: weeks,
cond: cond
}, // pass to server
success: function (received) {
const article_count = received['key_occurrence_cat']
console.log(article_count)
$('#keyword_article_count').empty()
//將內容加上li標籤附加起來顯示
for (let key in article_count) {
let paste = '<li>' + key + ':' + article_count[key] + '</li>'
$('#keyword_article_count').append(paste)
}
const kwfreq = received['key_freq_cat']
console.log(kwfreq)
$('#keyword_frequency').empty()
for (let key in kwfreq) {
let paste = '<li>' + key + ':' + kwfreq[key] + '</li>'
$('#keyword_frequency').append(paste)
}
const data_key_time_freq = received['key_time_freq']
console.log(data_key_time_freq)
showtimechart(data_key_time_freq)
} //function
}) //ajax
} //call_ajax()
// 宣告全域變數用於存放圖表實例
let line_chart = null
function showtimechart(data_key_time_freq) {
//取得繪圖元件
const ctx_key_time = document.getElementById('keyword_time_line_chart').getContext('2d')
const myoptions = {
type: 'line',
data: {
datasets: [
{
label: 's2',
borderColor: 'red',
data: data_key_time_freq // your data here!
}
]
},
options: {
legend: {
display: false
},
scales: {
xAxes: [
{
type: 'time',
time: {
unit: 'day',
displayFormats: {
//day: 'DD-MM-YYYY'
day: 'MM/DD'
}
}
}
],
yAxes: [
{
ticks: {
beginAtZero: true
},
display: true,
scaleLabel: {
display: true,
labelString: '出現次數'
}
}
]
}
}
}
// 檢查並清除舊圖
if (line_chart) {
line_chart.destroy()
}
// 畫新圖
line_chart = new Chart(ctx_key_time, myoptions)
}
</script>
{% endblock %}
熱門人物關鍵字分析
加上 時間過濾 的功能,過濾 只保留最後一天 的新聞"hot persons of yesterday.csv"
第一版2025 3 27 , 11:45 PM
最後版2025 3 27 , 11:45 PM
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up