# 2022 年[「資訊科技產業專案設計課程作業 3」](https://hackmd.io/@sysprog/info2022/https%3A%2F%2Fhackmd.io%2F%40sysprog%2Finfo2022-homework3) ## 工作職缺 Backend Engineer :::spoiler [Appier - Software Engineer, Backend Development](https://boards.greenhouse.io/appier/jobs/1588292) :::info Task of the role - Developing and operate scalable, reliable and maintainable service-based softwares and related components. - Integrate the front-end modules built by your coworkers into new services - Cowork with team members to design system architecture, choose proper technologies and plan development. - Design and maintain database schemas for new services - Profiling and performance tuning of critical components - Deploy system to production and monitor service health - Participate in idea brainstorming and contribute ideas to technology, algorithms and products - Participate on-call rotation within Backend team to ensure product reliability and scalability Minimum Qualification - BS/BA degree in Computer Science or related field with 3+ years experience in related industry - Ability to build web services on Linux. - Good at any of the listed language: Python / Scala / Go/Node.js. - Good knowledge of Network API Design ( e.g. REST or GraphQL). - Good understanding of any SQL/NoSQL database (MySQL / MongoDB / Redis / etc.) - Familiar with git. Prefered Qualification - MS degree in Computer Science or related field. - Good at profiler and debugging tools. - High performance network service on Linux. - Design and architect large scale distributed system. - Design and implement distributed algorithm and data structure. - Familiar with HTML and Javascript. - Familiar with Nginx / HAProxy. - Familiar with operation automation tool (such as Ansible). - Familiar with continuous integration / continuous deployment - Familiar with monitoring and alert system (Prometheus / Nagios). - Familiar with functional programming. - Familiar with Amazon Web Service or Google Compute Engine. ::: :::spoiler [Mobagel - Software Engineer – Backend](https://mobagel.com/tw/blog/2022/03/core-tech-software-engineer-%e8%bb%9f%e9%ab%94%e5%b7%a5%e7%a8%8b%e5%b8%ab%ef%bc%88%e6%ad%a4%e8%81%b7%e5%8b%99%e6%8f%90%e4%be%9b%e5%af%a6%e7%bf%92%e8%a8%88%e7%95%ab%ef%bc%89/) :::info Task of role - Maintain Decanter AI, our flagship analytics product, and develop new features. - Build CI/CD flow to maintain product stability. - Co-work with data science team to derive new functionalities. Minimum Qualification 1. Bachelor degree or above in computer engineering/computer science or related fields. 2. Experience with Linux based OS (CentOS, Ubuntu, Arch Linux, etc.). 3. Experience with an OS scripting language. 4. Knows Python. 5. Experience with non-trivial Python package (e.g. Jinja2, Pandas, FastAPI, Flask, etc.). 6. Experience with containerization technology. 7. Experience with database and caching services. Prefered Qualification 1. Experience working with complex software systems. 2. Experience verifying the correctness of an asynchronous program or parallel program. 3. Ability to share experience with junior members and help them grow. 4. Knowledge of machine learning algorithms and data processing techniques. 5. Experience with DevOps. 6. Experience with functional programming. ::: :::spoiler [PicCollage - Backend Developer](https://jobs.lever.co/piccollage/acaef490-ebb0-40cb-95bf-b63497b17b2f) :::info Task of role - Work directly with designs, product managers, and other engineers to help deliver new features for our internal teams and end users. - Maintain and improve the quality and performance of our code base. Minimum Qualification - 2-year experience working on backend applications. - Good understanding of OOP principles and software design. - Experience with building and maintaining scalable APIs. - Experience with RDBMS. - Proficient with at least one UNIX-like system and POSIX utilities. - Familiar with Git. Prefered Qualification - Ruby on Rails - Docker - API design and maintenance - Experience with B2C applications - Experience with CI/CD and DevOps concepts ::: ## 專業上匹配程度評估 ### 需要能力 - System design: Large scale distributed network service system that support high traffic (scalable, reliable). - Database: schema design, indexing, cache ([Redis or memcache](https://aws.amazon.com/elasticache/redis-vs-memcached/)), [locking](https://www.geeksforgeeks.org/implementation-of-locking-in-dbms/#:~:text=Locking%20protocols%20are%20used%20in,is%20called%20as%20Lock%20Manager.), relational (e.g., PostgreSQL), no sql (e.g., MongoDB), scalable, reliable - Cloud-based technology (e.g., AWS, GCP, Azure). - Backend programming language: Python, Go, Node.js, Ruby on Rail - Other skill: git, linux (shell script), docker, CI/CD - Able to cooperate with frontend, designer, agile (新創都會有這個需求) - Nice to have: - API design: REST, GraphQL, gRPC - OOP and design pattern - Web server (nginx), loadbalancer ([haproxy](https://cloudinfrastructureservices.co.uk/haproxy-vs-nginx-whats-the-difference/#:~:text=HAProxy%20is%20open%2Dsource%20software,stability%20and%20better%20performance%20results.)) - Profiling and stress testing. - Devops: - [Ansible](https://www.youtube.com/watch?v=tWR1KXgEYxE): Infrastructure as Code. Automate server management. - [Jenkins](https://en.wikipedia.org/wiki/Jenkins_(software)): CI/CD automation server. - [Terriform](https://www.youtube.com/watch?v=tomUWcQ0P3k): Infrastructre as Code for cloud-based services (e.g., AWS, GCP, Azsure) - parallel programming, async programming ### 自我評估 - Soft skill - Fluent english. - Presenting: Illustrate complext idea with simple concept. - Continue learning - Host a [Git](https://en.wikipedia.org/wiki/Git) workshop to share my exerience with Git. - Host Flutter study group. - Read academic paper. Survey latest techonology in academic. - Teamwork - In-domain - Team work with 4 other developer on a project. (Run scrum) - Async working: Git, Library contribution (setting documentation and developer guideline) - Cross-domain - Work with UI/UX designer to prototype a product. - Went through design thinking process. - Testing product on local community - Projects related to backend - Have a strong mindset on robustness of the system. I'm a advocate of test-driven design. - I have been a linux user for 5 years. I'm confortable working on unix-based machine. - Write a web crawler with Python (using [scrapy](https://scrapy.org/)). Put the data scrapped into MySQL database. And construct a RESTful API using [FastAPI](https://fastapi.tiangolo.com/). - Implement a push notification system using [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) with Node.js and dockerize the application. - Implement a genome index datastructure [fm-index](https://en.wikipedia.org/wiki/FM-index) to support fast short sequence exact match query. - Implment a audio real-time streaming service with [speaker diarization](https://en.wikipedia.org/wiki/Speaker_diarisation) in Python with [websocket](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API) and [speechbrain](https://speechbrain.github.io/). - Less experience - High traffic network application (loadbalancer) - AWS cloud-based techonogy. - Database indexing, Redis. - Devops: Ansible, logging. - API: GraphQL, gRPC - Network application stress testing and profilling. ## 面試題目整理 :::spoiler System design - Common system design questions - Design a social media app (e.g., Instagram, Facebook, Twitter) - Design a messenger app (e.g., Facebook messenger) - Design a parking lot: Testing your skills in object-oriented design, to see whether you can apply technical thinking to physical objects. - Design tiny URL - How to answer? - Ask clarifying questions. (e.g., What is the constrain of the system?) - Design high level. (e.g., Dataflow, API) - Drill down on your design - Bring it all together: Does the final design meet the goal? - Resources - [31 system design interview questions (and sample answers)](https://igotanoffer.com/blogs/tech/system-design-interviews#questions) - [系统设计面试题精选](https://soulmachine.gitbooks.io/system-design/content/cn/) - [[面試][系統設計]如何設計一個像 Facebook 的社交平台](https://ithelp.ithome.com.tw/articles/10278424) - [How to Succeed in a System Design Interview](https://blog.pramp.com/how-to-succeed-in-a-system-design-interview-27b35de0df26) - [Google software engineer interview: the only post you'll need to read](https://igotanoffer.com/blogs/tech/google-software-engineer-interview#system-design) ::: :::spoiler Database - What is ORMs? - What is ACID? - What is N+1 problem? - What is database normalization? - What is CAP theorem? - Mention the issues with traditional file-based systems that make DBMS a better choice? - How to implement a database index. - What is in memory database? - What is sharding? - SQL vs NoSQL, pros and cons. - Database scalling (replication vs sharding) - Other [database questions](https://www.interviewbit.com/dbms-interview-questions/#dbms-mcq). ::: :::spoiler Cloud services (AWS) - Define and explain 3 basic types of cloud services. - Computing (e.g., ec2), Storage (e.g., S3), Networking (e.g., build your own CDN like [cloudFront](https://aws.amazon.com/cloudfront/)) - How to upgrade or downgrade a system with near-zero downtime? - What are the tools and techniques that you can use in AWS to identify if you are paying more than you should be, and how to correct it? - What services can be used to create a centralized logging solution? - Amazon CloudWatch Logs and store them in Amazon S3. - [Other questions](https://www.simplilearn.com/tutorials/aws-tutorial/aws-interview-questions#basic_aws_interview_questions) ::: :::spoiler API design - What is RESTful API? And its use case. - What is gRPC? And its use case. - Can you name some of the main advantages of using gRPC over REST? - What’s the difference between gRPC and REST? - What is GraphQL? And its use case. - What is the reason behind the development of GraphQL? - How to do Server-side Caching in GraphQL? How is it different from REST? - What is websocket? What is the best time to use it? ::: :::spoiler design pattern (好像 interview Java 比較會被問到) - What are design patterns? (a concept establish by Gang of Four) - Name types of design pattern (creational, structural, behavioral) - How are design principles different from design patterns? ::: :::spoiler Internet - What is DNS and how does it work? - What is HTTP? - What is the different between HTTP and HTTPS? - What is domain name? ::: :::spoiler OS - How OS work in general? - Process management. - Thread and concurrency. ::: :::spoiler Backend - horizonal scaling vs vertical scaling - migration strategies - Mitigation strategies - graceful degration - throttleing - backpressure - loadshiftting - circuit breaker - Securty - HTTPS - CORS - content security policy - SSL/TLS - OWASP security risk - MD5 and why not use it? - SHA family - scrypt and bcrypt ::: ## 模擬面試 > :dog: Interviewee > :smiling_imp: Interviewer > :smiling_imp:: 我想請你幫我設計一個 twitter :dog:: 好的再開始之前我想先了解功能的需求,雖然我沒有用過 twitter 但就我所知他大概有以下三種功能 ``` - Tweet: 使用者可以發佈推文,推文必須在 280 字以內 - 閱讀推文,按照時間線排列 - 使用者閱讀自己過去的推文 - 使用者閱讀其他人有 follow 的朋友的推文 - 可以 follow 其他使用者,一旦 follow,那個人的推文就會顯示在你的動態 ``` :smiling_imp:: 是的你可假設我們就單純實行這些功能 :dog:: 如果要使用最單純的作法,我們可以用資料庫來儲存所有資料,資料庫裏面會有兩個 table,其中一個存放所有的推文,他的 schema 是 ``` Tweet ID Content Time User ID ``` :dog:: 另一個 table 存放所有使用者的資料,其 schema 如下 ``` User ID Following: a list. ``` :smiling_imp:: 那要如何完成上述的功能呢? :dog:: 首先是發佈推文,只要前端發佈一個 HTTP `POST` 請求,並在推文 table 新增一個欄位即可 :dog:: 接著是閱讀自己的推文,前端發佈一個 HTTP `GET` 請求,資料庫選取所有符合該 user ID 的推文,並按照時間順序由最新到最舊排序 :dog:: 至於閱讀 follower 的推文則是一樣由前端發佈一個 HTTP `GET` 請求,資料庫選取所有符合該 user follower ID 的推文,並按照時間順序由最新到最舊排序 :dog:: 最後是 follow 別人,前端傳送 HTTP `POST` 請求,並改動該 user 的 follower 欄位 :smiling_imp:: 那請問這個方法有什麼缺點? :dog:: 理論上推文 table 會越來越大,但每次使用者要閱讀推文時我們都要查詢這個大 table 顯然速度會很慢,而且大部份使用推特的人應該是閱讀大於發文,這會導致前面提到的問題嚴重性加劇 :smiling_imp:: 有什麼方式可以改善呢? :dog:: 因為在軟體的世界大部份找不到一個最佳的解法,只有最適合的方法。因此我們必須先釐清我們要優化的目標與我們應用場景的特性。 :dog:: 前面有提到閱讀推文的請求應該會大於發佈推文的請求。但是在閱讀推文時我們並沒有急迫性,也就是發文者一發出該則推文他的追隨者就要馬上讀到。只要在合理的時間範圍內讀到就好 (比如5分鐘以內) 這個概念是 eventual consistency :smiling_imp:: 的確以推特的例子,使用者不需要非常即時的看到推文。那你可以舉例說明什麼應用場景會需要即時看到訊息呢? :dog:: 像是 Line 這種聊天 app,如果你跟對方聊到一半他過很久才回你會以為他覺得你很難聊。或是電商平台,當商品資訊更新時必須要儘快讓所有正在瀏覽的人知道,包含庫存更新。或是股票這種時機很重要的 app。銀行轉帳也需要在一定得時間內確認交易,以避免存款是零但還繼續花錢 :smiling_imp:: 很好,那請問你會如何改善你的設計呢? :dog:: 首先因為使用者人數眾多,所以我們需要一台 load balancer 來負責分配流量,接著我們會有很多的儲存節點,分散在世界各個主要區域,以減少 access 的 lattency。接著我們可以把常存取的資訊利用像是 Redis 這種 memory database 加速我們的存取。這個 memory database 儲存所有 user 的時間軸,所以當有人想要看他的動態我們利用他的 user ID 找到他的 table 並回傳給他 :smiling_imp:: 使用者數量龐大,要如何有效的拉取對應的 table? :dog:: 我們可以用一個 hash table,key 是 User ID, value 是 table ID 來實現快速查找 :smiling_imp:: 剛剛提到每個方法都有 trade-off,你可以說名下這個方法我們犧牲了什麼嗎? :dog:: 這個方法是用空間 (in-memory database) 換取時間,所以我們需要很大的 memory。但好在推特的訊息大部份為文字,而且文字有一定的字數限制,所以這個問題不會被放大。 :smiling_imp:: 如果今天某個人發了一則推文,他的 follower 的動態會如何改動? :dog:: 首先我們要調出所有 follower 的動態,然後新增這則推文。所以針對每個 user 我們也要存一份他的 follower 清單 :smiling_imp:: 這個方法會有什麼潛在問題呢? :dog:: 如果今天發文的人是個網紅,有好幾百萬的追蹤者,那我們就要同時改動好幾百萬的動態,這可能會造成系統負擔 :smiling_imp:: 你有什麼解決的方法嗎? :dog:: 針對這些網紅,當他們發文時我們不會更新所有追蹤者的 redis 動態,我們使用前面的 SQL 方法,而當某個有 follow 這位網紅的人想要看他的動態時,我們再即時的去他 follow 的網紅的推文堆中找到新的推文並更新上去 (算是一種延後更新)。這裡我們使用了混合式的方法,並根據使用場景切換最適合的方法。 :smiling_imp:: 假設我想要實作 push notification 讓後台送通知給使用者提高他們在 app 的停留時間,我可以怎麼做呢? :dog:: 之前我有一個自己的專案在實現 Uber,我使用了 server-sent event 這個技術來實現後台主動推播訊息到使用者 :smiling_imp:: 太好了,你竟然有相關的經驗。請你說明一下 server-sent event 是什麼?為什麼他可以用來做 push notification :dog:: 一般在實現 push notification 有幾個方法,最單純就是 frontend 要定期去問 server 有沒有新的通知,這個方法稱為 pooling。這是一個很浪費時間的方法,因為我們不會時時都有通知。如果 server 能在有我的通知時主動傳給我就好了。Server-sent event 是一個 one-way channel,他只會保留 server 送到 client 的這條通道,因此 client 是無法主動聯繫 server 的,必須被動的接受 server 送來的 event stream。有了這個機制,我們就不需要費資源的 pooling了 :smiling_imp:: 很好,感謝你今天的參與 :dog:: 謝謝