# Minimun Developer ## Definition - 專案完成所需技能, 學完後可以完成組織交付的任務 ## DoD - 能開發新的feature(function/method) (Check Point) ## Flow - [1]能在local起專案 - [2]能寫出符合團隊規範的code - [3]能操作符合團隊的git流程 ## Must Read - [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html) ## [1] IDE - Pycharm - DataSpell ## [1] fileformat - yaml - json - .env ## [1] Environment Management - Poetry - Pyenv (Python版本管理) - Docker 基礎指令 - Docker compose 基礎指令 ## [1] Going Cloud - Set up Going Cloud Project Template - Can run a chatbot and use it ## [2] OOP - Classes - Inheritance - Methods ## [2] Package / Library / Framework - Gradio - boto3 - FastAPI - Pydanic ## [2] Development principle (懂基礎概念) - Controller - Service - Entity - Repository - Gateway ## [2] Design principle - SOLID - DRY - Don’t Repeat Yourself - KISS - Keep It Simple & Stupid - YAGNI - You Ain’t Gonna Need It ## [2] Others - Type hints ## [3] Git - Basic(Add, Commit, Branch, Push, Pull, Merge, Rebase, Stash) - Branch Strategy (Trunk Based Development) - Conventional Commit - 確定pre-commit有執行 ## [DoD] Going Cloud - New Feature to Going Cloud Example Project # Minimun IR-based LLM Application(GC-IRLLM) (IR=Information retrival) ## Definition - 專案完成所需技能, 學完後可以完成組織交付的任務, 尤其實在專案基礎上微調 ## DoD - New Toy Project of GC-IRLLM ## Flow - [1] 知道IR-based LLM Application概念 - [2] 能在local起專案 - [3] 能call 專案內API 餵資料進去 - [4] 能調整prompt - [5] 有基礎QA能力 ## [1] Going Cloud - Set up GC-IRLLM Template ## [2] LLM setting (MUST) - Temperature - Top P - Other Hyper parameters ## [2] LLM concept (MUST) - Types of LLMs - Pros and Cons of LLMs ## [2] Application Concept - What's RAG - Standard RAG flow ## [2] Package / Library / Framework (MUST) - LangChain - Azure OpenAI API - AWS BedRock API - ## [3] Going Cloud - GC-IRLLM Data API - Basic ETL from Raw Data ## [3] Data Chunk Strategy - Whole - Character Text Split ## [4] Prompting Techniques - Role Prompting - Tasks Prompting (QA/Summary/Generation) ## [4] Functionality - Question Answering ## Debug - [5] 知道如何操作GC-IRLLM, 查看檢索到的資料 - [5] 知道基礎實驗記錄方法(excel, o11y service) ## Demo - [DoD] Demo new toy IRLLM --- # Basic Developer(團隊最小公約數 (junior)) ## Development principle - Clean Architecture ## Environment Management - Poetry - Pyenv (Python版本管理) - Docker 基礎指令 - Docker compose 基礎指令 ## Package / Library / Framework - Huggingface Transformer - Pandas - Numpy - Scikit-learn - Pytorch or Tensorflow ## Others - Error handling ## Linting/Formating Settings - black - isort - Flake8 - pylint - EditorConfig ## RDBMS tools - SQLalchemy - SQL syntax (這個要會) ## API Document - Swagger (in FastAPI)(不會也可以開發) ## API - RestfulAPI # Advanced Developer ## Git - Git hook ## Profling and Performance Debug - pyroscope - Memray - Viztracer - profyle ## Optimization skill - asyncio (async/await) - Thread - Multi-process ## Testing - Integration Testing (high level to API Testing) - Test Driven Development - Domain Driven Design - UI Testing ## Testing - Unit testing ## Test Tools - Pytest # Basic Machine Learning (MLE 的基本素養) ## Task - Classification - Clustering - Regression ## Training Strategy - Supervise Learning - Un-Supervise Learning - Semi-Supervise Learning ## Knowledge - Machine Learning (基本概念) - Deep Learning (基本概念) ## Neural Network - ANN - VAE - Transformer (Encoder, Decoder) ## Domain knowledge - GenAI ## Design and Development Principles - Golden dataset Desgin - Experiment Design # Advanced Machine Learning (MLE 的基本素養) ## Knowledge - Reinforcement Learning - Online learning ## deploy skill - Distilling - Quantization ## Domain knowledge - NLP - RecSys # Basic IR-based Application ## Definition - 團隊最小公約數 (junior) - 能解釋啥是好, 啥是壞 ## Application Concept - Relevant search - Vector Similarity Function ## Retrieve Sources (會打基本的 API) - Vector DB (OpenSearch/ElasticSearch) - Document DB (OpenSearch/ElasticSearch) - Files (PDF, ppt ...etc) ## Tools - opensearch-py/elasticsearhc-py ## Retrieve Strategy - Single - Fusion ## Generation Strategy - text - streaming - prefix, postfix ## Data Chunk Strategy - Whole - Character Text Split ## Functionality - Summarization - Question Answering # Advance IR-based Application ## ElasticSearh / OpenSearch (No) - Vector Search - Text Search - Index - Term Query - String Query - Analyzer - Keywords ## LLM usage concept - Agents - Tools - Characters ## IR Concept - RAG - REALM - RETRO ## NLP Concept - NER - NLU - QA ## Package / Library / Framework - SemeticKernel (??? 為什麼要會這個) ## Retrieve Sources - NoSQL (DynamoDB, levelDB, Redis) - SQL (DuckDB) - Graph DB (neo4j/arrangodb) ## Data Chunk Strategy - Document QA generation - Parent-Child - Knowledge Graph ## Prompting Techniques - Few Shot Learning Prompting - Chain of Thought - Zero Shot Learning Prompting - Chain of Validation - Chain of Note - Tree of Thoughts - Rephrase and Respond - Personalized (NLU, status) (組合型的 prompt 來達到個人化的效果) ## Retrieve Strategy - [Rerank] Reciprocal Rank Fusion - [Rerank] Cohere ## Generation Strategy - chunk streaming (研究中) ## Functionality - Summarization - Multiple rounds of Dialog - Content Generation ## Evaluation - Context Relevancy(Recall/Precision/F1) - Generated Answer Relevancy ## Evaluation Tools - Ragas - promptfoo ## LLMOps - Prompt Versioning - Index Versioning - Data Versioning ## LLMOps Tools - [Experiment]MLflow - pezzo/BentoML - Grafana + Prometheus - LIDA - llmonitor (langfuse?) ## Networks - CORS - HTTPS - OAuth - JWT ## Others - gunicorn/unicorn