Tony Siu
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # COURSE SCRAPER - Requirements : https://hackmd.io/ZwKIk2c8SkOFmVTuYF68ww - Meeting record : https://hackmd.io/@FiO-Internal/Hy5qXh1Ou - Trello : https://trello.com/b/QDl5vQR3/ntpu-course-recommendation ## Current Instruction 我們先把「Tagging System」的「最小雛形」做出來,從兩個面向找交集: 1. (你已完成)北大課程的 tagging 2. (需要進行)學生的 tagging 再接著做到:推薦學生課程/科系 再來才是:爬其他學校的資料 我稍早有與 TW Jack 討論到這一個問題。請這樣做: 1. 請 Ivan 提供:fio app ids which are “Learning Curve” relate apps and the apps have data @IvanTsou 2. 請 Eason 提供: fio app ids which are 靜心高中 Learning Curve relate apps and the apps have data @EasonC13 請提供出來,讓 Tony 可以根據這些資料進行後續事情。 也請 Eason 教一下 Tony:透過 FiO API 取得指定 fio app id 的上鏈內容,以用於 tagging system - access highschool data curl command: - [prototype api docs for getting data](https://dev.fio.one/api-docs/bif-api/#/) - curl -X GET "https://dev.fio.one/my-apps/772/data" -H "accept: application/json" -H "X-API-KEY: kCXRqaJxiNyuj5zbVgYLidGtBEMtp8ns" ## Abstract According to past discussions, the process is divided into - Collection of information: curriculum information of designated schools (one/several rooms), FiO Learning Curve - Discuss the relevance of different Tag/Hash - Tagging: tags created by students themselves, tags run through our system ## Purpose Create reusable Data collection and extraction preprocessing for NTPU so as to abstract away preprocessing step and keep data format standardized. **Tasks** 1. Establish correlation between different tag or hash 2. The database of the default (main) tag classification and the student's own tag (for reference) will be built separately 3. Set different weights for different tags 4. In the initial stage, some keywords will be collected from the curriculum syllabus or department establishment, class name, etc. of each school (both Chinese and English). Community Verified icon ## Technologies - Trello - HackMD - Excel - SqlDBM app - Spyder - Jupyter Notebook - D3.js - Anaconda - FastAPI - SQL - Sqlite3 - MySQL - Docker compose - valentina studio ## Bare Bones course recommendation ![](https://i.imgur.com/iX330ti.png) - /index (get request): - returns "Adhoc course recommendation" message - /docs (get request): <- should be pretty intuitive to use as a GUI - shows fastapis docs interface to play with the apis - /manual_match/{user_email} (get request) - input user email as query - if no email match return {"error":"no data"} - simple greedy algo of matching TFIDF max words counts with student and university departments - /highschool/{user_email} (get request) - input {all} to match all highschool data as its clusters to match with university clusters.(K means is unsupervised, so you need to determine what each clusters as feature vectors mean) - query by user email, do perform TFIDF weighted clustering of student words and with university TFIDF weighted clusters - if no email match return {"error":"no data"} - Python libraries: - requests,urllib,time,re,os,bs4,time,chardet,lxml - sqlite3,pymysql, pandas,numpy,sqlalchemy - sklearn.preprocessing - **custom built functions - [jieba for chinese word parsing](https://investigate.ai/text-analysis/using-tf-idf-with-chinese/) - [Online marketing productivity and analysis tools](https://advertools.readthedocs.io/en/master/advertools.stopwords.html) ## Entity Relational Model **DB Schema** - design a standardized database schema **Data Base Table Description** <- currently in production DB - chinese_course_description_bulletins_tb: - each individual courses desecription bulletins - chinese_course_prerequisites_tb: - prerequisites for taking each course written in chinese language - chinese_query_guide_tb: - inidividual course descriptions superficial information - chinese_tec_tb: - aggregation of all instructors weekly course/office hours schedule - c_general_courses_tb: - aggregation of all course catalogs from all departments in chinese language - english_course_prerequisites_tb: - prerequisites of courses written in english - course_supposed_for_elective_required_tb: - course serial number mapped with its target majors group and whether its elective or not - e_general_courses_tb: - aggregation of all department catalogs in english - e_general_remarks_tb: - remarks only available in english deparment catalogs - identity_course_limitations_tb: - table that shows which course serial No. is for what kind of person to take - major_course_limitations_tb: - shows serial NO. course for student from what major --- [adhoc ERM excel](https://drive.google.com/file/d/1WYtk1c6joR-sdxkx2dEcZBW6DumBhCY0/view?usp=sharing) [sample normalization](https://docs.google.com/spreadsheets/d/1eaE2zPFBxH_4kCqw9tOmS09IOcJ4yUXFB_i2IOoCk68/edit?usp=sharing) [drawio diagram](https://drive.google.com/file/d/1XYvdfVLdyxfbzNHI4w4yPxPWk71x7d8k/view?usp=sharing) [SqlDBM env](https://app.sqldbm.com/MySQL/Edit/p179701/#) [Sample pdf that can be parsed with webservice to excel](https://drive.google.com/file/d/1nY_5KglaFIlY8-txGKmghgQYgCJo6PPG/view?usp=sharing) ![](https://i.imgur.com/DtyzWEd.png) - Tables that have more general data fields ![](https://i.imgur.com/IOkQllH.png) - possible normalization plans **Basic recommendation** - matching different clusters of tags to each other - Gdsc club may be in cluster of 1 "programming" <- from Ntpu tag extraction - student tag programming clubs is also in cluster 1 of "programming" - High student tag extraction - Tag Matching - match top 100 words of weighted TFIDF of queried student and univserity department - or can match highschool student cluster(with manual labeled clusters) to university information clusters(also manually labeled) - Provide Data - High School Learn Curve (text). - Process the data - TFIDF - Keyword **Deployment and Packing of NTPU recommendation** - use FastAPI then provide API Document to prototype and test recommendation. - only prototype as this stage - will need repeated analysis improvement and testing validation of real data for production use - FastAPI is written in python so also need further project design and requirements of how to integrate to TMS learning curve **Where to scrape first?** NTPU: [chinese undergraduate courses](https://sea.cc.ntpu.edu.tw/pls/dev_stud/course_query_all.queryByReOp?qCollege=&qDept=&qDept2=GU15&qkind=%A5%B2%BF%EF%AD%D7&qYear=&qTerm=&qGrade=&qClass=&week=&seq1=A&seq2=M) [english undergraduate courses](https://sea.cc.ntpu.edu.tw/pls/dev_stud/COURSE_QUERY_ENG.queryByReOp?qCollege=&qDept=&qDept2=GU15&qkind=%A5%B2%BF%EF%AD%D7&qYear=&qTerm=1&qGrade=&qClass=&week=&seq1=A&seq2=M) - Faculty diversity: - NTPU, National Taiwan University, National Chengchi University, National Tsing Hua University, Jiaotong University, Normal University - Taike, Yunke, Pingke ## 2021/07/27 Course Recommendation(preprocessing) note - Characteristics of the student <=> Characteristics of the course - (Characteristics of the student) The record of the student CP1, APAC implied Good logic => (Characteristics of the course) Probability, algorithm - Recommended student departments - Method: TF-IDF - Simple way: through the course, find the representative words of the department - Difficult way: find the representative words of the department through the introduction of the department - The easiest way currently available: - step 1. cut term / Chinese word tokenize / segmentation - step 2. Word frequency - step 3. remove auxiliary words NOTICE - Do not mix up the courses of the graduate school and doctoral class - The serial number that starts with N must be removed first, and the one that starts with U is left - First understand the rules of serial number or course number - M: master - U: undergraduate - N: Bachelor of Advanced Studies - P: Master's in-service special class - qYear=109&qTerm=2 -> year & semester - Curriculum - PDF: Courses that the department can offer - Checked on the webpage: There are actually courses offered in the semester - Mainly "found on the website" (there are courses opened in the year) - Do you need to be recommended to the "school" level, or only recommend the "department" - Example: CS vs CSIE - "School" and "Department" should be kept separately - Conclusion: "School" and "Department" should be saved **(Important)** - Department: Keep the fields of "English name of department", "English abbreviation of department", "Chinese name of department", and "Chinese abbreviation of department" - Purpose: Use "English abbreviations" in cross-school data merge is less prone to errors - no masters, U stands for undergraduate - separate tables from U and M,N - scrap courses that are active? - if in english attach department to the course name - make every table specific to its subject - alternative word for course and department - IF-IDF scoring word feature importances - course description/ history e.g. algorithms course number may have different coures numbers through times? - http://sea.cc.ntpu.edu.tw/pls/dev_stud/course_query_eng.query_frame?flag=6 [Chinese course sample](https://sea.cc.ntpu.edu.tw/pls/dev_stud/course_query_all.queryByAllConditions?seq1=A&qCollege=%AAk%AB%DF%BE%C7%B0%7C&qYear=109&qTerm=2) [English course catalogs](https://sea.cc.ntpu.edu.tw/pls/dev_stud/course_query_eng.query_frame?flag=1) [TF IDF feature importance preprocess](https://towardsdatascience.com/tf-idf-for-document-ranking-from-scratch-in-python-on-real-world-dataset-796d339a4089) [natural language optimized DB](https://towardsdatascience.com/return-clauses-in-natural-language-queries-74a4a2fd53e6) [hacker news recommendation example](http://www.righto.com/2013/11/how-hacker-news-ranking-really-works.html?m=1) ## 2021/03/12 Tagging System & AI - The relevance of different Tag or Hash - What the system needs (collected secretly) - Collectible data - Uncollectible data - The student's own tag - Save separately, calculate separately - Refer to use data, use weight to divide - Set different weights for different tags - Each entity has its own tag -> database 1 - Tag entered by each user -> database 2 - AI's tag should be constantly updated - School vs. student keywords - Develop a basic guideline for the school and teachers to order tags - Course name, students, suggest keywords, let the other party choose - Method 1: Find a school teacher - Method 2: Parsing course outline or department establishment, course name (this method is more feasible) - The behavior cannot be deceived, succession, activity service, participation time and place - Target - Departments that can recommend students - Course selection system of each school -> Recommended courses, course of study - School, department, outline - Tag classification - system - Custom - First catch both Chinese and English -> then translate into Chinese - Allow schools to upload "finished orders", FiO provides format - Name of the event party, introduction, experience - Check entity, how to get their tag - Make a small prototype first ## 2021/03/03 About Tagging System About Tagging System(Learning Curve + AI, discussed w/ Hana): - Suggest before the meeting 1. List "specific" (sub)goals, the specific meaning is... what do you want to see? It cannot be too general; it is best if there is a quantitative definition 2. For each target, list all (possibly) required data fields and attributes - Meeting minutes The purpose must be mastered: - Student learning pattern - Student's future learning direction - Student learning behavior - Student's learning preferences - The strengths and weaknesses of students - After analysis, give students suggestions for learning - Achieve through Tagging system - Points to consider: - The analysis object needs to be divided into batches (samples in different periods and different batches will be different) - Questions to be added during form design: - Preference for activities - Will you participate again - How to obtain the background of the participants? -> Related to the logic of horizontal analysis - First develop a preliminary tagging model - Various types of descriptions (confirm various categories, that is, the results of personality analysis) - Keyword used - How high is the student's intention - What each department's position would like to see - Appeals and cross-references of various departments to students (LMS+LRS) - Preliminary conception (there are three roles: activity party, student, school) - Collect data - Activity Party (Department) - Tag classification (department, course name orientation) - Theme of the event - Keyword - Narrate - student - Degree of preference - Experience - Like/dislike points - Rating (1-5 stars) - Recommended level - Will you participate again - Cross-validation option (to avoid data errors) - satisfaction level - School - Upload learning journey - Upload of Student Sexuality Test (Certified) ## FiO 台北大學 課程爬蟲 **Abstract** We hope to build a system that will refer to the high school students’ learning history and give advice on selecting technology. For example, many people may not know that there is a department such as the "Theatre Therapy Department" abroad. We hope that this system has read the student's study history (participated in activities, reads, participated in camps, etc.) ), you can suggest that this student may be suitable for studying drama therapy. In order to do this, the first step we need is the course materials for the bachelor's class at Taipei University. **Background Information** - Very early experiment - Relations: Ivan, Joe, Karl - Why use NTPU University: - Because the collaborating professor is a NTPU University professor - Because NTPU University's department-level courses are easier to climb **Caution** - Because it is still a very early experiment, data storage needs to maintain a certain degree of flexibility **References** [台北大學課程查詢系統](https://sea.cc.ntpu.edu.tw/pls/dev_stud/course_query_all.query_frame?flag=1) - [Query example](https://sea.cc.ntpu.edu.tw/pls/dev_stud/course_query_all.queryByAllConditions?seq1=A&qCollege=%AAk%AB%DF%BE%C7%B0%7C&qYear=109&qTerm=2) - key list qEdu: qCollege: qdept: qYear: 109 qTerm: 2 qGrade: qClass: 應修系級 cour: teach: qMemo: week: seq1: A seq2: M **Index Definition** - Course serial number - In principle, it can be cross-yearly. A few courses have the same serial number but different Chinese course names (the English is the same) - May have to pull out the data and observe - Department of Courses - Now it’s the abbreviation, and then the full name will be changed (to make a connection). - Law Section, Faculty of Law -> Maintain - Language -> Language Center - to be confirmed - Consistent with the term "visual inspection" used by the Department of Education - It was beaten by people, but there is still regularization - Is there any strange course name (group by confirmed) - Course Requirements - Corresponding to his compulsory elective courses together, it may be more troublesome here, it depends on which line - Check for special conditions - Business Management Department 1A 2B - 1: Grade - A: Grouping, grouping by the number of people - If there is a "department" at the end, it should be the course taught by the department (guess) - Others: General Education Center, Language Center… etc - Special situation: Master of enterprise, master of state-owned enterprise, summer school, business school 1, Taipei University of Science and Technology? ? ? U2228? ? ? ? ? Taipei University of Science and Technology? [link](http://sea.cc.ntpu.edu.tw/pls/dev_stud/course_query.queryGuide?g_serial=U2228&g_year=109&g_term=2&show_info=part) ![](https://i.imgur.com/GnBCV7w.jpg) - Limited number of repairs, selected number: need to climb - Course Name - All catch, but mainly in English, turn to lowercase and remove blanks and symbols - Different departments, choose different calculus, but in fact the same content, but separate (the English course name is the same), but the course name may be the same but the content is different (program language python / c) - same name different content? ## OTHER - Enrollment is different from the beginning of the course, which may represent a non-main subject and have low weights - Combine different introductions of the same English course name? - Block repair limit Not deal with ![](https://i.imgur.com/6qVJXdC.png) ### Githubs - [pdf to api](https://github.com/pdftables/python-pdftables-api) - [sqlite-web](https://github.com/coleifer/sqlite-web)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully