ItC Python & GitHub Homework === National Taiwan University Introduction to Computer 2019 Fall - Python and GitHub Homework > Due date [time=Mon, Dec 30, 2019 11:59 PM] # TAs > [name=r08922053 Yu-Kai Huang] > [name=r08922042 Kuang-Yu Jeng] - Instructor > [name=Winston Hsu] --- # Requirements - Crawl the announcement [page](https://www.csie.ntu.edu.tw/news/news.php?class=101) of CSIE website within specified range of dates. (Please use the request headers in TA sample codes) The results should contain but not limited to the following fields: - Post date - e.g. ``2019-05-14`` - Title - e.g. ``107學年度資訊學群畢業典禮重要公告(典禮前請務必詳閱) - 5/31更新`` - Content - recursively find all the *text* in ``<div class="editor content">`` - Please save the results to a CSV file which **can be opened by Excel** using utf-8. Please note that: - User should be able to specify the path to write the CSV file with ``--output`` argument. - Formats - Each record in one line. - Fields of a record are seperated by a comma "," with no space or new line between. - Strings in the CSV file are enclosed by a pair of double quotation mark (e.g. \"I'm string \" ). And any double quote within a string should be replace by 2 double quotation mark. For instance, the string: "Prof. Yuguang "Michael" Fang, University of Florida" should be replaced by "Prof. Yuguang ""Michael"" Fang, University of Florida" --- # What you should do - Create programming environment (<cite>[Linux environment][4]</cite>) - Init your git repository with a README - If not, there will have no master branch. Reference [this](https://docs.google.com/presentation/d/123JcZ-YwsCXcY6PYHk31_1wCss0ukQFGQ05SrOa6ZIg/edit#slide=id.g6c70dc8c07_0_0). - Clone the repository to local - (optional) Copy <cite>[TA sample codes][2]</cite> to your local repo, push to origin, and star TA repository - Start programming (<cite>[Python Toturial][3]</cite>) - After, finishing the crawler, remember to write - team members' names - school ids - Brief introduction to what the project does - Environment - e.g. CSIE Workstation, Python 3.6.2, lxml\=\=4.4.2, tqdm\=\=4.28.1, ... - collaboration contribution (which programming parts you are responsible for) - Put your git url in a file and upload to ceiba, only one person in the team should upload. --- # What TAs will run ```bash python3 main.py --start-date [start date] --end-date [end date] --output [out filename] ``` - ``--start-date`` and ``--end-date`` will be in the format of ``[Year]-[month]-[day]``. For instance, ``2019-12-09``. - ``--output`` is the csv filename to save. For instance, ``output.csv``. --- # Score Hope you get :100: ## (60 pts) Python - (10 pts) Run without error - (5 pts) Correctly parse arguments - (10 pts) Output files to correct place and can be opened by Excel and pandas.read_csv - ![](https://i.imgur.com/tuhfwSx.png) - (5 pts) Sort by post date (current to before) - (-20 pts) Sleep $0.1$ seconds before every request. This rule is **required**. You will lose points if you violate the rule. - (30 pts) Contents are correct - ==If your python contains malicious codes, all the team members will fail==. ## (40 pts) GitHub - (10 pts) Protect master branch - (10 pts) Pull Request - ==Always have pull request when merging to master branch== - Peer review with comment - (10 pts) Collaboration: $\ge 2$ people in the team commit codes - (5 pts) Neat and tidy network: Rebase is **required** when merging codes with conflict. Abnormal branch networks would be thought of as **not** neat and tidy. - ![](https://i.imgur.com/8WEhCEl.png) The above is considered as not neat and tidy. - (5 pts) Branch - Merge or delete unneccessary branches after finishing the homework - Branch name should have meaning - For example, branch name ``kai`` is meaningless ## Others - (-10 pts) Readme contains no team members' names or school ids or descriptions or environment - If no collaboration contribution is specified, TAs would think team members equally contributed to the homework. - README is generally written in markdown format, but it is optional to use the format. If you are interested in how to use markdown, you can reference <cite>[markdown tutorial][5]</cite> --- ## Related Links - <cite>[GitHub Tutorial][1]</cite> - <cite>[TA Sample code][2]</cite> - <cite>[Python Introduction][3]</cite> - <cite>[Environment Setting][4]</cite> [1]: https://docs.google.com/presentation/d/123JcZ-YwsCXcY6PYHk31_1wCss0ukQFGQ05SrOa6ZIg/edit?usp=sharing [2]: https://github.com/kaikai4n/ItC-python-hw-sample-code [3]: https://docs.google.com/presentation/d/14pCla_krES-uVRrrv-aW1XtZNFV0ArhNn89WVedeV3Y/edit?usp=sharing [4]: https://docs.google.com/presentation/d/1O43qZ5th7l5kpojirpqSCzVXqZtvL_7WZ7Z05wSCWig/edit?usp=sharing [5]: https://hackmd.io/s/features-tw