--- title: Day 2 SQL、爬蟲 (Neo) tags: '課程筆記' description: View the slide with "Slide Mode". --- # Day 2 SQL、爬蟲 (Neo) [TOC] --- ## SQL ### python上練習方式 安裝sqldf 教學連結:https://pypi.org/project/sqldf/ --- ### 基本語法 語法查詢:BigQuery + KeyWords select...from 選取(資料集.資料表....) distinct 查詢類別 limit 資料筆數 count 個數 order by (defult升冪,降冪desc) where 條件式 ``` SELECT * #"*"代表所有 #"distinct"可以拿來看特定欄位有什麼資料 #要再自己查查SQL資料 FROM `cf-internship.demo.dsp_uu_daily` #where click=0 #limit 20 where impress=0 order by click desc #desc是降冪(由大排到小) ```   ### Join的公式 ``` SELECT uuid,content FROM `cf-internship.demo.topic_score` as t1 join `cf-internship.demo.dsp_uu_daily` as t2 on t1.uuid = t2.uid /*mapping cloumn*/ limit 20 ```   ### 想知道每個人點擊的情況(Group by) ``` SELECT uid,sum(impress) as total_impress,sum(click) as total_click FROM `cf-internship.demo.dsp_uu_daily` group by uid limit 20 ```  ### 子查詢 (因為SQL不像Python,無法命名變數) ``` select sum(click) from (SELECT * FROM `cf-internship.demo.dsp_uu_daily` where uid is not null LIMIT 20) ```  ### 練習CTR計算 > CTR=點擊/曝光 ``` /*法一 (having)*/ SELECT uid,sum(click)/sum(impress) as CTR FROM `cf-internship.demo.dsp_uu_daily` group by uid having sum(impress) <> 0 /*"<>"排除什麼*/ /*having 用在group by,功能類似where*/ order by ctr desc /*降冪排列*/ ``` ``` /*法二 (where)*/ select uid,(total_click/total_impress) as CTR from (select uid,sum(click) as total_click, sum(impress) as total_impress from `cf-internship.demo.dsp_uu_daily` group by uid) where total_impress !=0 order by CTR desc ```  --- ### 練習不同人到不同地區次數 ``` /* group by 兩個欄位 */ SELECT uid, country, count(country) as country_total FROM `cf-internship.demo.dsp_uu_daily` group by uid, country ```  --- ### Jupyter進入方法 IP:34.80.38.243 Rita:5005 Rowan:5006 Eric:5007 ### 虛擬環境用法 Docker, Containerize python module: venv (virtual environment) 用途:避免套件衝突 > #### 更先進的方法:Docker > 可以直接提供包裝好的環境給別人,因為是獨立出來的,可以應用在各種環境 --- ## 爬蟲 Jupyter "Lab"是新版的(推薦使用) ### crawler-example-bs(美麗湯) lxml解析方法比bs預設快  ### selenium request(server 請求)爬蟲速度比selenium(開瀏覽器)快
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up