###### tags: `Crawler` `news` # Crawler news targets (Taiwan) 1. [東森新聞雲ettoday](https://www.ettoday.net/) 2. [聯合新聞網udn](https://udn.com/news/index) 3. [今日新聞nownews](https://www.nownews.com/) 4. [三立電子報setn](https://www.setn.com/) 5. [寰宇新聞網globalnewstv](http://globalnewstv.com.tw/) 6. [自由電子報liberty times](https://www.ltn.com.tw/) 7. [中時電子報china times](https://www.chinatimes.com/) 8. [新唐人全球新聞new tang dynasty television](https://www.ntdtv.com/b5/prog1244) 9. [香港蘋果](https://hk.news.appledaily.com/realtime/china) 10. [蘋果日報appledaily](https://tw.appledaily.com/home) # Crawler news targets (China) 1. [央视网cctv](https://www.cctv.com/) 2. [人民网people.com](http://www.people.com.cn/BIG5/) ## New get goal 1. Currently news website id 2. url 3. title 5. author 6. content 7. images 8. datetime 9. category of news 10. key words ## Crawler format Using the scrapy framework than output to json data. Datetime Format : **YYYY-mm-dd HH:MM:SS** [Json Editor](https://jsoneditoronline.org/) ## Scrapy of sub crawler processing ```python= """ version : python 3.6 crawler : {crawler name} update : {date} author : {name} """ ``` ## Kind of news description # Crawler news targets (Taiwan) - [x] 1. [東森新聞雲ettoday](https://www.ettoday.net/) - [x] 2. [聯合新聞網udn](https://udn.com/news/index) GET way : api FROM URL : https://udn.com/api/more?page=3&id=&channelId=1&cate_id=0&type=breaknews&totalRecNo=5693 - [ ] 4. [今日新聞nownews](https://www.nownews.com/) GET WAY : api FROM URL X: https://www.nownews.com/wp-json/wp/v2/posts?per_page=10 HOT news : - [x] 6. [三立電子報setn](https://www.setn.com/) - [x] 7. [寰宇新聞網globalnewstv](http://globalnewstv.com.tw/) GET TYPE : HTML FROM URL : [real time format](http://globalnewstv.com.tw/?ajax-request=jnews&lang=zh_TW&action=jnews_module_ajax_jnews_block_3&module=true&data%5Bfilter%5D=0&data%5Bfilter_type%5D=all&data%5Bcurrent_page%5D=8&data%5Battribute%5D%5Bheader_icon%5D=&data%5Battribute%5D%5Bfirst_title%5D=%E5%8D%B3%E6%99%82%E6%96%B0%E8%81%9E&data%5Battribute%5D%5Bsecond_title%5D=%3Cspan+style%3D%22padding-left%3A+10px%3B+color%3Ared%3B%22%3EJUST+IN%3C%2Fspan%3E&data%5Battribute%5D%5Burl%5D=&data%5Battribute%5D%5Bheader_type%5D=heading_3&data%5Battribute%5D%5Bheader_background%5D=&data%5Battribute%5D%5Bheader_secondary_background%5D=&data%5Battribute%5D%5Bheader_text_color%5D=&data%5Battribute%5D%5Bheader_line_color%5D=&data%5Battribute%5D%5Bheader_accent_color%5D=&data%5Battribute%5D%5Bheader_filter_category%5D=&data%5Battribute%5D%5Bheader_filter_author%5D=&data%5Battribute%5D%5Bheader_filter_tag%5D=&data%5Battribute%5D%5Bheader_filter_text%5D=All&data%5Battribute%5D%5Bpost_type%5D=post&data%5Battribute%5D%5Bcontent_type%5D=all&data%5Battribute%5D%5Bnumber_post%5D=5&data%5Battribute%5D%5Bpost_offset%5D=0&data%5Battribute%5D%5Bunique_content%5D=disable&data%5Battribute%5D%5Binclude_post%5D=&data%5Battribute%5D%5Bexclude_post%5D=&data%5Battribute%5D%5Binclude_category%5D=&data%5Battribute%5D%5Bexclude_category%5D=&data%5Battribute%5D%5Binclude_author%5D=&data%5Battribute%5D%5Binclude_tag%5D=&data%5Battribute%5D%5Bexclude_tag%5D=&data%5Battribute%5D%5Bsort_by%5D=latest&data%5Battribute%5D%5Bdate_format%5D=custom&data%5Battribute%5D%5Bdate_format_custom%5D=m%2Fd+H%3Ai&data%5Battribute%5D%5Bexcerpt_length%5D=0&data%5Battribute%5D%5Bexcerpt_ellipsis%5D=&data%5Battribute%5D%5Bpagination_mode%5D=loadmore&data%5Battribute%5D%5Bpagination_number_post%5D=5&data%5Battribute%5D%5Bpagination_scroll_limit%5D=0&data%5Battribute%5D%5Bads_type%5D=disable&data%5Battribute%5D%5Bads_position%5D=1&data%5Battribute%5D%5Bads_random%5D=&data%5Battribute%5D%5Bads_image%5D=&data%5Battribute%5D%5Bads_image_link%5D=&data%5Battribute%5D%5Bads_image_alt%5D=&data%5Battribute%5D%5Bads_image_new_tab%5D=&data%5Battribute%5D%5Bgoogle_publisher_id%5D=&data%5Battribute%5D%5Bgoogle_slot_id%5D=&data%5Battribute%5D%5Bgoogle_desktop%5D=auto&data%5Battribute%5D%5Bgoogle_tab%5D=auto&data%5Battribute%5D%5Bgoogle_phone%5D=auto&data%5Battribute%5D%5Bcode%5D=&data%5Battribute%5D%5Bads_bottom_text%5D=&data%5Battribute%5D%5Bscheme%5D=normal&data%5Battribute%5D%5Bcolumn_width%5D=auto&data%5Battribute%5D%5Btitle_color%5D=&data%5Battribute%5D%5Baccent_color%5D=&data%5Battribute%5D%5Balt_color%5D=&data%5Battribute%5D%5Bexcerpt_color%5D=&data%5Battribute%5D%5Bcss%5D=&data%5Battribute%5D%5Bpaged%5D=1&data%5Battribute%5D%5Bcolumn_class%5D=jeg_col_1o3&data%5Battribute%5D%5Bclass%5D=jnews_block_3) This need to add the postman of parms. ```html= <div class="jeg_thumb"> <a href="http://globalnewstv.com.tw/202002/97271/"> <div class="thumbnail-container animate-lazy size-715 "><img width="120" height="86" src="http://globalnewstv.com.tw/wp-content/themes/jnews/assets/img/jeg-empty.png" class="attachment-jnews-120x86 size-jnews-120x86 lazyload wp-post-image" alt="創五年新高! 流感併發重症數逼近千例" data-src="http://i1.wp.com/img.globalnewstv.com.tw/uploads/2020/02/20200216180132_73.jpg?resize=120%2C86" data-sizes="auto" data-srcset="" data-expand="700" data-animate="0" /></div> </a> </div> <div class="jeg_postblock_content"> <h3 class="jeg_post_title"> <a href="http://globalnewstv.com.tw/202002/97271/">創五年新高! <br>流感併發重症數逼近千例</a> </h3> <div class="jeg_post_meta"> <div class="jeg_meta_author"><span class="by">by</span> <a href="http://globalnewstv.com.tw/author/kagemichi/">Kagemichi</a></div> <div class="jeg_meta_date"><a href="http://globalnewstv.com.tw/202002/97271/"><i class="fa fa-clock-o"></i> 02/16 18:02</a></div> <div class="jeg_meta_comment"><a href="http://globalnewstv.com.tw/202002/97271/#respond"><i class="fa fa-comment-o"></i> 0</a></div> </div> <div class="jeg_post_excerpt"> <p></p> </div> </div> </article> <article class="jeg_post jeg_pl_md_2 post-97269 post type-post status-publish format-standard has-post-thumbnail hentry category-home-first category-8 tag-66 tag-2494 tag-19343 tag-19370 tag-19586 tag-covid-19 tag-19923"> <div class="jeg_thumb"> <a href="http://globalnewstv.com.tw/202002/97269/"> <div class="thumbnail-container animate-lazy size-715 "><img width="120" height="86" src="http://globalnewstv.com.tw/wp-content/themes/jnews/assets/img/jeg-empty.png" class="attachment-jnews-120x86 size-jnews-120x86 lazyload wp-post-image" alt="Fed宣布維持利率水準 鮑爾警告疫情可能影響全 球經濟" data-src="http://i0.wp.com/img.globalnewstv.com.tw/uploads/2020/01/20200130115548_73.jpg?resize=120%2C86" data-sizes="auto" data-srcset="" data-expand="700" data-animate="0" /></div> </a> </div> <div class="jeg_postblock_content"> <h3 class="jeg_post_title"> <a href="http://globalnewstv.com.tw/202002/97269/">新冠肺炎擴大檢疫條件 <br>14天內自國外返台有症狀者要篩檢</a> </h3> <div class="jeg_post_meta"> <div class="jeg_meta_author"><span class="by">by</span> <a href="http://globalnewstv.com.tw/author/kagemichi/">Kagemichi</a></div> <div class="jeg_meta_date"><a href="http://globalnewstv.com.tw/202002/97269/"><i class="fa fa-clock-o"></i> 02/16 17:46</a></div> <div class="jeg_meta_comment"><a href="http://globalnewstv.com.tw/202002/97269/#respond"><i class="fa fa-comment-o"></i> 0</a></div> </div> <div class="jeg_post_excerpt"> <p></p> </div> </div> </article> <article class="jeg_post jeg_pl_md_2 post-97267 post type-post status-publish format-standard has-post-thumbnail hentry category-4 category-24 tag-211 tag-6834 tag-19343 tag-19370 tag-19555 tag-19683 tag-covid-19"> <div class="jeg_thumb"> <a href="http://globalnewstv.com.tw/202002/97267/"> <div class="thumbnail-container animate-lazy size-715 "><img width="120" height="86" src="http://globalnewstv.com.tw/wp-content/themes/jnews/assets/img/jeg-empty.png" class="attachment-jnews-120x86 size-jnews-120x86 lazyload wp-post-image" alt="武漢肺炎史上最毒! WHO專家估:再1個月達高峰" data-src="http://i0.wp.com/img.globalnewstv.com.tw/uploads/2020/02/20200208172832_49.png?resize=120%2C86" data-sizes="auto" data-srcset="" data-expand="700" data-animate="0" /></div> </a> </div> <div class="jeg_postblock_content"> <h3 class="jeg_post_title"> <a href="http://globalnewstv.com.tw/202002/97267/">中國湖北省宣布 <br>24小時實施最嚴格封閉管理</a> </h3> <div class="jeg_post_meta"> <div class="jeg_meta_author"><span class="by">by</span> <a href="http://globalnewstv.com.tw/author/kagemichi/">Kagemichi</a></div> <div class="jeg_meta_date"><a href="http://globalnewstv.com.tw/202002/97267/"><i class="fa fa-clock-o"></i> 02/16 17:33</a></div> <div class="jeg_meta_comment"><a href="http://globalnewstv.com.tw/202002/97267/#respond"><i class="fa fa-comment-o"></i> 0</a></div> </div> <div class="jeg_post_excerpt"> <p></p> </div> </div> </article> <article class="jeg_post jeg_pl_md_2 post-97264 post type-post status-publish format-standard has-post-thumbnail hentry category-4 category-24 tag-46 tag-89 tag-6266 tag-19343 tag-19370 tag-19712 tag-covid-19"> <div class="jeg_thumb"> <a href="http://globalnewstv.com.tw/202002/97264/"> <div class="thumbnail-container animate-lazy size-715 "><img width="120" height="86" src="http://globalnewstv.com.tw/wp-content/themes/jnews/assets/img/jeg-empty.png" class="attachment-jnews-120x86 size-jnews-120x86 lazyload wp-post-image" alt="一夜增60起病例!鑽石公主號130人染武漢肺炎" data-src="http://i1.wp.com/img.globalnewstv.com.tw/uploads/2020/02/20200210141106_7.jpg?resize=120%2C86" data-sizes="auto" data-srcset="" data-expand="700" data-animate="0" /></div> </a> </div> <div class="jeg_postblock_content"> <h3 class="jeg_post_title"> <a href="http://globalnewstv.com.tw/202002/97264/">美國包機今晚抵達日本羽田 <br>接回鑽石公主號上美籍乘客</a> </h3> <div class="jeg_post_meta"> <div class="jeg_meta_author"><span class="by">by</span> <a href="http://globalnewstv.com.tw/author/kagemichi/">Kagemichi</a></div> <div class="jeg_meta_date"><a href="http://globalnewstv.com.tw/202002/97264/"><i class="fa fa-clock-o"></i> 02/16 16:30</a></div> <div class="jeg_meta_comment"><a href="http://globalnewstv.com.tw/202002/97264/#respond"><i class="fa fa-comment-o"></i> 0</a></div> </div> <div class="jeg_post_excerpt"> <p></p> </div> </div> </article> <article class="jeg_post jeg_pl_md_2 post-97262 post type-post status-publish format-standard has-post-thumbnail hentry category-home-first category-8 tag-320 tag-356 tag-357 tag-1324 tag-1640 tag-6834 tag-19922"> <div class="jeg_thumb"> <a href="http://globalnewstv.com.tw/202002/97262/"> <div class="thumbnail-container animate-lazy size-715 "><img width="120" height="86" src="http://globalnewstv.com.tw/wp-content/themes/jnews/assets/img/jeg-empty.png" class="attachment-jnews-120x86 size-jnews-120x86 lazyload wp-post-image" alt="霸王級寒流襲美 造成26人凍死" data-src="http://i1.wp.com/img.globalnewstv.com.tw/uploads/2019/02/20190201104231_47.jpg?resize=120%2C86" data-sizes="auto" data-srcset="" data-expand="700" data-animate="0" /></div> </a> </div> <div class="jeg_postblock_content"> <h3 class="jeg_post_title"> <a href="http://globalnewstv.com.tw/202002/97262/">台14甲翠峰到大禹嶺 <br>下午5點起預警性封閉</a> </h3> <div class="jeg_post_meta"> <div class="jeg_meta_author"><span class="by">by</span> <a href="http://globalnewstv.com.tw/author/kagemichi/">Kagemichi</a></div> <div class="jeg_meta_date"><a href="http://globalnewstv.com.tw/202002/97262/"><i class="fa fa-clock-o"></i> 02/16 15:58</a></div> <div class="jeg_meta_comment"><a href="http://globalnewstv.com.tw/202002/97262/#respond"><i class="fa fa-comment-o"></i> 0</a></div> </div> <div class="jeg_post_excerpt"> <p></p> </div> </div> </article> ``` - [x] 9. [自由電子報liberty times](https://www.ltn.com.tw/) GTE TYPE : API FROM URL : https://news.ltn.com.tw/ajax/breakingnews/all/2 - [ ] 10. [中時電子報china times](https://www.chinatimes.com/) - [ ] 11. [新唐人全球新聞new tang dynasty television](https://www.ntdtv.com/b5/prog1244) - [ ] 12. [香港蘋果](https://hk.news.appledaily.com/realtime/china) - [ ] 13. [蘋果日報appledaily](https://tw.appledaily.com/home) - [ ] 1. [央视网cctv](https://www.cctv.com/) - [ ] 2. [人民网people.com](http://www.people.com.cn/BIG5/) per_page = amount of show