資訊科技產業專案設計課程作業 3

# 資訊科技產業專案設計課程作業 3 ## 較符合自身興趣/規劃的職務描述 ### Meta, Data Engineer :::spoiler Job description - Partner with leadership, engineers, program managers and data scientists to understand data needs - Apply proven expertise and build high-performance scalable data warehouses - Design, build and launch efficient & reliable data pipelines to move and transform data (both large and small amounts) - Securely source external data from numerous partners - Intelligently design data models for optimal storage and retrieval - Deploy inclusive data quality checks to ensure high quality of data - Optimize existing pipelines and maintain of all domain-related data pipelines - Ownership of the end-to-end data engineering component of the solution - Support on-call shift as needed to support the team - Design and develop new systems in partnership with software engineers to enable quick and easy consumption of data #### Minimum Qualifications - BS/MS in Computer Science or a related technical field - 5+ years of Python or other modern programming language development experience - 5+ years of SQL and relational databases experience - 5+ years experience in custom ETL design, implementation and maintenance - 3+ years of experience with workflow management engines (i.e. Airflow, Luigi, Prefect, Dagster, Digdag, Google Cloud Composer, AWS Step Functions, Azure Data Factory, UC4, Control-M) - 3+ years experience with Data Modeling - Experience working with cloud or on-premises Big Data/MPP analytics platform (i.e. Netezza, Teradata, AWS Redshift, Google BigQuery, Azure Data Warehouse, or similar) - 2+ years experience working with enterprise DE tools and experience learning in-house DE tools - Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. #### Preferred Qualifications - Experience with more than one coding language - Experience designing and implementing real-time pipelines - Experience with data quality and validation - Experience with SQL performance tuning and end-to-end process optimization - Experience with anomaly/outlier detection - Experience with notebook-based Data Science workflow - Experience with Airflow - Experience querying massive datasets using Spark, Presto, Hive, Impala, etc. - Experience building systems integrations, tooling interfaces, implementing integrations for ERP systems (Oracle, SAP, Saleforce etc). ::: #### 自身匹配程度及優缺點 - 優點： - 懂得ETL工具的原理 - 缺點： - 沒有實際操作過hadoop, flink, spark等ETL工具 ### Databricks, Senior Software Engineer - Backend :::spoiler Job description At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve their business. Founded by engineers — and customer obsessed — we leap at every opportunity to solve technical challenges, from designing next-gen UI/UX for interfacing with data to scaling our services and infrastructure across millions of virtual machines. And we're only getting started. The impact you'll have: As a software engineer with a backend focus, you will work with your team to build infrastructure and products for the Databricks platform at scale. Our backend teams span many domains, from OS/container systems, to serverless infrastructure, to partner integration, and to machine learning infrastructure. #### Example projects - Serverless platform: Build Databricks serverless platform that powers the big data, machine learning and Gen AI workloads. Automatically improve the performance and efficiency of workloads running on the platform while also ensuring they scale to millions of machines across a multi-cloud environment. - Core cloud platform: Build cutting-edge OS, container and networking technologies that power the entire Databricks infrastructure, from internal infrastructure, to serverful, and to serverless platform. Improve the security, efficiency and reliability across the fleet. - Partner ecosystem: Build and grow a partner ecosystem for Databricks SQL. Build and offer a variety of toolkits to support partners and customers’ deep and seamless integrations with Databricks. Maintain and nurture productive and beneficial partnerships within the ecosystem. - Notebook dataplane: Build multi-language multi-user collaborative REPL experience for Databricks users with high level of reliability and responsiveness, with a focus on Data Science and AI/ML authoring and experimentation experiences in notebooks. - Application platform. Build tools, frameworks and platforms to allow Databricks engineers to build, deploy and operate services with "batteries included". Improve the scale, observability and safety of all Databricks services across different clouds while improving every engineer's developer experience. #### What we look for: - BS (or higher) in Computer Science, or a related field - 5+ years of production level experience in one of: Java, Scala, Golang, C++, or similar language - Experience developing large-scale distributed systems - Experience working on a SaaS platform or with Service-Oriented Architectures - Experience with cloud technologies, e.g. AWS, Azure, GCP, or Kubernetes - Experience with security and systems that handle sensitive data ::: #### 自身匹配程度及優缺點 - 優點： - 有操作AWS的經驗 - 缺點： - 對serverless的實作不熟悉 - K8s理解但也不熟悉 - 沒有建立過 large-scale distributed 經驗 ### Google, Software Engineer, Infrastructure, Google Cloud Storage :::spoiler Job description #### Minimum qualifications: - Bachelor’s degree or equivalent practical experience. - 2 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree. - 2 years of experience with data structures or algorithms. - 2 years of experience with developing infrastructure, distributed systems or networks, or experience with compute technologies, storage or hardware architecture. #### Preferred qualifications: - Master's degree or PhD in Computer Science or related technical fields. - Experience with distributed systems in the storage domain. - Experience developing accessible technologies. #### About the job Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward. Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems. The US base salary range for this full-time position is $136,000-$200,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process. Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google. #### Responsibilities - Write product or system development code. - Participate in, or lead design reviews with peers and stakeholders to decide on available technologies. - Review code developed by other developers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency). - Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback. - Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality. ::: #### 自身匹配程度及優缺點 - 優點： - 有操作AWS的經驗 - 缺點： - 對serverless的實作不熟悉 - K8s理解但也不熟悉 - 沒有建立過 large-scale distributed 經驗 ## 檢視自身能力並撰寫簡歷 **[Resume](https://owenowenisme.github.io/dummy-cv.pdf)** ### Working Experience #### Frontend Intern,Gogoout Co., Ltd, Taipei, Taiwan - Migrate frontend framework from Nuxt.js to Next.js - Migrate Image Handling from frontend server to AWS Serverless Image Handler, minimized high quality image(10MB) loading time down to 1s on average #### React Native App Developer (Remote), Source Solution Co., Ltd, Hsinchu, Taiwan - Collaborate the development of a cross-platform mobile application using React Native, successfully deployed the app on both iOS and Android - Conducted thorough testing and debugging of the mobile application using Testight ### Personal Projects #### NCKU Past-exam System - Tech Stack: Fastapi, Postgresql, Next.js, TailwindCSS, Docker, Grafana, Prometheus - Built a past-exam platform with React to allow students in NCKU upload past tests and others will be able to download #### Coursera Auto Finisher - Created a Chrome extension that help students to finish tedious videos and reading in coursera instantly - Boost the speed of students finish Coursera by 60% #### Positions of Responsibility - Proect Leader, Scrum Master, Infra NCKU Google Developer Student Club ## mock interview 的前期準備 :::info ### ETL Q1: 什麼是 ETL？為什麼它很重要？ A: ETL 是一個資料整合流程，包含：擷取（Extract）：從各種來源擷取資料（資料庫、API、檔案）轉換（Transform）：將資料轉換成適當格式（清理、驗證、重新格式化）載入（Load）：將資料載入目標位置（資料倉儲、資料湖）重要性： - 實現多來源資料整合 - 確保資料品質和一致性 - 可以協助商業智慧和分析 - 為data-driven決策提供資料source Q2:如何優化 ETL 效能 A: 1. 資料層面： - 增量擷取：只處理新增或變更的資料 - 分區處理：將大量資料分區處理 - 資料壓縮：減少傳輸量 - 索引優化：建立適當的索引 2. 處理層面： - 平行處理：多執行緒 - 記憶體優化：適當使用記憶體快取 - 批次處理：合併小批次操作 3. 架構層面： - 分散式處理：使用分散式框架 - 負載平衡：合理分配工作負載 - 快取策略：使用適當的快取機制 ::: :::info ### System Design Q1.Netflix 系統設計： "假設你是 Netflix 的系統架構師，請設計一個全球性的影片串流服務。 - 如何確保全球各地的用戶都能獲得流暢的觀看體驗？ - 採用多層 CDN 架構 - 適應位元率（ABR - 內容分發策略：熱門內容預先部署到邊緣節點 - 系統如何處理突發的高流量？比如新劇集上線時 - 內容預熱：提前將內容分發到 CDN - 自動擴展：容器自動擴CDN 節點動態擴展 ::: :::info ### Backend Q1.設計高併發訂單系統 "假設你正在設計一個電商平台的訂單系統，需要處理高併發場景（如秒殺活動）系統需求： - 日常訂單量 100 萬/天 - 秒殺期間峰值 10 萬 QPS - 訂單狀態實時同步 - 庫存準確性 100% 請詳細說明: - 資料庫設計方案 - 高併發解決方案 - Consistency保證 A: #### DB - Separate Read/ Write - shardDb with userId (考量到scenario 是 user 只能查詢自己的訂單) #### 高併發 - Redis Cluster - Kafka -> async處理訂單 #### Consistency - 樂觀鎖（同時保持user良好體驗） - Redis + 主要DB Dual Write :::