- [ ] API
- [x] write
- [x] online
- [x] token
- [x] shorter
- [ ] system
- [x] rebuild the structure
- [x] 152 docker auto start
- [x] readme 1.0
- [x] hanlp server / client
- [x] ckip server / client
- [x] web re-process
- [x] eval re-process
- [x] add lda to trend system, find diff aspect (口委要求)
- [x] readme 2.0
- [ ] Paper
- [x] write thanks
- [x] check
- [x] change ptt/paper trend example (口委要求)
- [x] upload
- [ ] add 承緯summary to baseline in ch4.4 / ch4.5
- [ ] write lda method
- [ ] WEB UI
- [x] online
- [x] news ori url
- [x] more button
- [x] summary ui
- [x] summary ui add calender
- [x] summary ui example
- [ ] BUG
- [x] time line latest/earliest
- [x] grouping (api no ner)
- [x] nlg
- [ ] MORE
- [ ] add udn data, compare with old data (口委要求)
- [ ] add topic (盧要求)
- [ ] auto find topic and generate pattern
# timeline_summary
## 相關檔案
- hanlp server
- ckip server
- news db (elastic)
- crawler
https://gitlab.ilovetogether.com/p76104273/crawler_hanlp_es/blob/master/README.md
## 專案架構
```
top
| route.py (server route entry)
| main.py (summary)
| main_eval.py (summary for eval)
| main_web.py (summary for web)
```
```
main
+---model (for word/phrase similarity)
| | embedding.py
+---read (get news & read pattern)
| | es_query.py
| | query_file.py
| +---asset
| | gREAF.csv
| | speakv.txt
| | sREAF.csv
| | topic.txt
| | 國家列表.csv
+---utils
| | eventchain.py (predict PREAFS & summary chain)
| | parsener.py (extract ner)
| | parsetree.py (extract svo_event)
| | timeline.py (write db)
| | tool.py (preprocessing)
```
```
eval
| main_eval.py (summary for eval)
+---eval
| | label.csv (ground truth)
| +---data (input)
| | 口罩-eval.json
| | 快篩-eval.json
| | 疫苗-eval.json
| | 缺水-eval.json
| | 缺蛋-eval.json
| | 缺電-eval.json
| +---pred (output)
```
```
web
| route.py (server route entry)
| main_web.py (summary for web)
+---web
| +---db (guidance summary db)
| | +---fromes (news cache)
| | +---口罩
| | +---快篩
| | +---疫苗
| | +---缺水
| | +---缺蛋
| | +---缺電
| +---static
| | +---css
| | +---images
| | +---js
| +---templates
| | index.html
```
```
paper
| structure.drawio
| structure_final.jpg
| 口試PPT_草稿.pptx
| 口試PPT_final.pptx
| 論文_草稿.docx
| 論文_overleaf.zip
| 論文_final.pdf
| 論文進度.pptx
```
## main 簡述
- feature
- hanlp
- dep -> SVO (event)
- sdp -> find unorder verb
- ckip
- ner -> ner phrase & convert
- mark
- paste speakv on SVO
- paste NER on SVO
- paste unorder verb on SVO
- predict REAF in PREAFS on SVO
- chain
- predict PS in PREAFS on SVO
- svo-svo-...
- grouping / ranking
## WEB
### 時間軸網頁
flask + vue
http://140.116.245.152:4273/
### 啟動步驟
#### 檔案位置
進入 140.116.245.152/home/wmmkslab/timeline_summary
#### 建立 docker image
docker build -t 2023-timeline-summary:latest .
#### 啟動 docker
docker run -itd -p 4273:80 -v /home/wmmkslab/timeline_summary:/home/wmmkslab/timeline_summary --restart always --name=2023-timeline-summary 2023-timeline-summary
#### 進入 docker
docker exec -it 2023-timeline-summary /bin/bash
#### 離開 docker (啟動有-d不會終止)
exit
### PORT 設定
進入 /etc/apache2/sites-available,開啟 000-default.conf
```
<VirtualHost *:4273>
# config for p76104273 demo system
ServerAdmin p76104273@gs.ncku.edu.tw
ServerName 140.116.245.152
DocumentRoot /home/wmmkslab/timeline_summary
# ProxyPreserveHost On
ErrorLog ${APACHE_LOG_DIR}/error_2023_timeline_summary.log
CustomLog ${APACHE_LOG_DIR}/access_2023_timeline_summary.log combined
<Directory "/home/wmmkslab/timeline_summary">
AllowOverride None
Require all granted
</Directory>
</VirtualHost>
```
## API (主要)
timeline web = news_trend + guidance_summary
guidance_summary $\supset$ time_converter + grouping
### news_trend
目標:根據一個議題的趨勢,找到不同時間段新聞資料

#### 檔案位置
code: code/api/news_trend
client: code/api/news_trend/client.py
server: http://140.116.245.152:4273/news_trend
#### 輸入輸出
input
* 議題名稱
* 時間範圍
```
topic = "缺電"
start = "2022/03/01"
end = "2022/03/30"
```
output
* 一段時間範圍
* 該段時間的新聞資料
```
write file in test.json
[
{
"start_date": "2022-03-03",
"end_date": "2022-03-04",
"data": [ some news ... ]
},
{
"start_date": "2022-03-04",
"end_date": "2022-03-10",
"data": [ some news ... ]
}
]
```
### guidance_summary
目標:做出 PREAFS summary
未來會有 WEB UI (還沒做)
#### 檔案位置
code: code/*
client: code/main.py
server: http://140.116.245.152:4273/summary
#### 輸入輸出
input
* 議題
* 新聞時間
* 新聞內文
```
input_topic = "缺電"
articles = [
{
"release_time": "2022-03-03",
"body": " news article ... "
},
{
"release_time": "2022-03-04",
"body": " news article ... "
},
...
]
```
output
* guidance summary
```
[
(PREAFS_label, summary),
(PREAFS_label, summary),
...
]
```
## api (次要)
### time_converter
目標:給中文字,轉換成時間
2022/03/05 + 前兩天 -> 2022/3/3
#### 檔案位置
code: code/api/time_converter
client: code/api/time_converter/client.py
server: http://140.116.245.152:4273/time_converter
#### 輸入輸出
input
* 錨點時間
* 轉換詞
```
time_str = "後三天"
anchor_time = "2022/03/01"
```
output
* 轉換後的時間
```
2022/03/04
```
### grouping
目標:給一些句子,去除重複的句子
#### 檔案位置
code: code/api/grouping
client: code/api/grouping/client.py
server: http://140.116.245.152:4273/grouping
#### 輸入輸出
input
* list of sents
```
sents = [
"原因出在興達電廠開關站事故",
"原因是興達電廠開關場事故",
"羅秉成說,興達電廠開關廠有故障",
"電廠故障造成事故"
]
```
output
* list of sents after grouping
```
['羅秉成說,興達電廠開關廠有故障']
```
## sample code
## general_kg
#### 檔案位置
code: api/general_kg
#### 設計結構
```
general_kg
├── main.py (進入點)
├── mi2s_parser.py (製作出 svo_event list)
├── mi2s_extractor.py (對 svo_event 進行特定的標記)
└── pattern (特定標記的 pattern 設計)
└── concept.csv
```
#### pattern結構
目前搭配沒有先後順序
```
service: 服務名稱
topic: 主題
label-type-1: 預期標記1
label-type-2: 預期標記2
colen: 搭配詞數量
desc-i: 說明(給人看得)
type-i: +/-(+希望搭配ㄝ, -不希望搭配)
phrase-i: 關鍵詞
```
```
example1
service: 說什麼
topic:
label-type-1: Y
label-type-2:
colen: 1
desc-1:
type-1: +
phrase-1: 提到
example2
service: 懶人包結構
topic:
label-type-1: R
label-type-2:
colen: 2
desc-1:
type-1: +
phrase-1: 需求
desc-2:
type-2: +
phrase-2: 上升
```
#### 輸入輸出
input
* 文字
* pattern 設計
```
text = '日本經濟部長王美花昨赴立院備詢時表示,三○三停電主因是人為疏失、的確非缺電,而近期多處遭到限電,但會加強設備巡檢。'
eventlist, hanlp_result = extracting(text)
```
output
* eventlist
* hanlp_result
```
eventlist =
[
{
"svo": str,
"orisent": str,
"speakv-1": label,
"PREAFS-1": label,
"something you define": label,
...
},
{
"svo": str,
"orisent": str,
"speakv-1": label,
"PREAFS-1": label,
"something you define": label,
...
},
]
```
```
hanlp_result =
{
"tok": [],"pos": [],"dep": [], "sdp": [],
"toki2si": [],
}
```