玉山 NLP 應用挑戰賽 - HackMD

玉山 NLP 應用挑戰賽

3rd. BlackBox Operator

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

爬蟲 Crawling

有了先前的基礎，
基本上只花了一個晚上就把資料都爬回來了。
爬蟲是相對容易，但是需要重複性勞動的工作，
以下介紹我是怎麼爬新聞的。

模型雛型 Naive Model

模型大概可以分成兩個部份。

classifier 用來辨別是否為 AML 新聞。
extractor 用來提取目標人名。

以下就來介紹一下一開始是怎麼實作的。

玉山 NLP 應用挑戰賽 3rd. BlackBox Operator