# PyCon TW 2016 Collaborative Talk Notes <br> Day 3 - R1
> ### Quick Links
> - [Portal for Collobration Notes 共筆統整入口](https://hackfoldr.org/pycontw2016) (hosted by [hackfoldr](https://hackfoldr.org/about) and [HackMD](https://hackmd.io/))
> - [Program Schedule 議程時間表](https://tw.pycon.org/2016/events/talks/)
> - [PyCon TW 2016 Official Site 官網](https://tw.pycon.org/2016/)
>
> ### How to update this note?
> - Everyone can *freely* update this note. 任何人都能自由地更新內容。
> - Please respect all the participants and follow our [code of conduct](https://tw.pycon.org/2016/about/code-of-conduct/) during discussion. 討論、記錄時,請遵守大會的[行為準則](https://tw.pycon.org/2016/about/code-of-conduct/)。
## Talk: 如何打造關鍵字精靈
- Info: https://tw.pycon.org/2016/en-us/events/talk/57694625669840919/
- Slider: http://www.slideshare.net/ssuser05afc89/how-to-build-an-keyword-wizard
- Speaker: 施晨揚
#### What is keyword
是一個有指標或是有識別性的字詞,且他也包含著一些特定的意義
#### Why we need ?
- Advertisement (廣告)
- TAG (標籤)
- Relation (關聯性)
- Article Summary (文章的總結)
#### Word Relation Model
關聯性搜尋
* Model 1(關聯詞)
沖繩 -> 飯店、自由行、推薦
* Model 2(同義詞)
沖繩 -> 琉球、壺屋通、...
#### Word Representation - Vector Space Model
把文字、文章 Mapping 到多維向量空間
可以看出那些文章,或是哪些詞是有關係的
#### One Hot v.s Continue Value
如果是維度十分高的話(多維空間),是很難辨識出哪些詞是相似的
#### Word Representation -One Hot Representation
最簡單的方法 - One Hot Representation
先把每一個詞建出一個 `One Hot Index`
但是這種編碼模式會找不到詞與詞之間的相關性,找關係會很難找
#### Word Representation - Context Vector
在範例中,以詞作為 X Y 軸來產生一個表格,
把兩個詞之間同時出現的機率來辨識出兩個詞之間的相關性
Ex. 沖繩 vs. 浮淺 = 0.7, 沖繩 vs. 餐廳 = 0.1
#### Word Context Vector
講到拉麵 -> 美味しい
講到一蘭 -> 喔依西捏
就可以把兩個詞關聯起來
> 可是一蘭和赤坂明明都不好吃啊
> 別醬子 XD
#### Co-occurrence Matrix
如果很大的話 n ~= 500k
那 space = n & n,time = n *n
#### Word2Vec
word2vec = 兩層式的類神經網路
「我想要去沖繩 ... 潛水」必須再看到前面的字就要能預測出會說 `潛水`,
可能的詞有:打球、潛水、睡覺、...、洗臉(可能有好多個 Label),
可以用類神經網路來逼近出這個 Model,
[Reference](https://www.tensorflow.org/versions/r0.8/tutorials/word2vec/index.html)
#### Major Process Flow
1. Article Selection
2. Content Extraction
3. Word Cutting
#### Article Raw Data Preparation
文章都是一行,要幫文章做斷詞,把文章中的詞以空格隔開。
#### Term Database
收集詞庫
- Search Log
- 各大電商網站(e.q 阿里巴巴)
- Link1
- Link2
- http://baseterm.com/
- 輸入法詞庫
- 詞庫 破解
#### Term Database - Search Log
`google search sole` → `search histroy` →`Filter & Counting` →`Term Collection`
#### Search Log
從 search log 產生詞庫,可以直接用 count 來做,
累積到一定的數量,就可以知道 `太陽的後裔` 是一個新詞
但是可能會有奇怪的詞混進來,所以要限制長度
#### Term Database
#### Word Cutting
- Word Cut Tool
- Jieba
- Get Bot Token
> 推結巴,好用
## Talk: First try for CAS, SymPy with codegen
- Info: https://tw.pycon.org/2016/en-us/events/talk/58534680193925150/
- Speaker: Chiu-Hsiang Hsu
- Slides: https://speakerdeck.com/wdv4758h/first-try-for-cas-sympy-with-codegen
### Introduction
+ `Sympy`可協助數學運算
+ 建symbol、expression
+ `symplify`可直接代入運算式得結果
+ `expand`展開
+ `solve`解方程式
+ `lambdify`產生可運算程式碼
+ 可以接到各種語言的backend 像fortran、numpy...
chebyshev Approx...
> 沒抄到 orz
### SymEngine
+ C++寫的sympy
## Talk: Geo processing with Python: How to convert, clean, aggregate and compress your geo-data for web
- Info: https://tw.pycon.org/2016/en-us/events/talk/69816036404232254/
- Speaker: Juha Suomalainen
### About
- WiredCraft 架構/工程師
- 主要工作為geo資料視覺化
### what drives me
- Visualizations
- UX
- Engineering 希望把所有東西做的簡單,好看
### Projects working on
- CO2 visualizaion
- data.worldbank.org
- Flood Risk
### Technologies
#### Geo Data Format
- shapefiles (GIS 格式)
- dfb: shapes
- prj: coordinate
- shp: main entrypoint
- shx: index file(?)
#### Shape formats for WEB
- geojson (simple, standard json)-https://github.com/geojson/draft-geojson
- topojson (more compact, boarder sharing)-https://github.com/mbostock/topojson
#### Tools
- QGIS (desktop app)
- Geojson.io (web app)
- mapshaper.org (feature simplification)
- mapbox.com (basemap creation)
- js:
- leaflet.js
- mapbox.js (propertory, speaker has good user experience)
- d3.js (customise, low-level APIs) -可參考 http://www.taiwanstat.com/
### simple approach
1. shapefiles and api
2. data processor
3. geojson / json
4. webapp
#### Frontend
- load data
- basemap
- stylethe features
- create the ranges
#### Common pitfalls
- Data encoding
- Coordinate systems
- Check the mappings
#### Optimizing for web
* file size is critical
* use [topojson](https://github.com/mbostock/topojson) to save space
* simplify the features with [mapshaper](http://www.mapshaper.org/)
#### Optimizing choropleth
* play with border styling
* make it interactive
* try differenet color schemes
#### 如何取得data?
County open data OR JSON Api
* [Natural Earth](http://www.naturalearthdata.com/downloads/)
* [GDAL](http://www.gadm.org)
* [World bank open data api](http://data.worldbank.org/developers?display=)
* [TGOS](http://tgos.nat.gov.tw/tgos/web/tgos_home.aspx)
* [NGIS](http://ngis.nat.gov.tw/)
#### Resources
* Formats: Shapefiles, Geojson, Topjson
* Python packages: pyshp, geojson, topojson
* Sites: 前面介紹的那幾個
* Tools: QGIS
* [Wiredcraft blog]( https://wiredcraft.com/blog)
#### Content on UX
* interactivity
* colors + styling
* usability with mobile devices
* talk to ther users
#### example
* 接續上面的 simple approach
* 修改自 mapbox tutorial (?)
* mapbox 讀 python 生出來的 geojson
* `getColor` 用分數決定顏色