# Minimun Developer
## Definition
- 專案完成所需技能, 學完後可以完成組織交付的任務
## DoD
- 能開發新的feature(function/method) (Check Point)
## Flow
- [1]能在local起專案
- [2]能寫出符合團隊規範的code
- [3]能操作符合團隊的git流程
## Must Read
- [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
## [1] IDE
- Pycharm
- DataSpell
## [1] fileformat
- yaml
- json
- .env
## [1] Environment Management
- Poetry
- Pyenv (Python版本管理)
- Docker 基礎指令
- Docker compose 基礎指令
## [1] Going Cloud
- Set up Going Cloud Project Template
- Can run a chatbot and use it
## [2] OOP
- Classes
- Inheritance
- Methods
## [2] Package / Library / Framework
- Gradio
- boto3
- FastAPI
- Pydanic
## [2] Development principle (懂基礎概念)
- Controller
- Service
- Entity
- Repository
- Gateway
## [2] Design principle
- SOLID
- DRY - Don’t Repeat Yourself
- KISS - Keep It Simple & Stupid
- YAGNI - You Ain’t Gonna Need It
## [2] Others
- Type hints
## [3] Git
- Basic(Add, Commit, Branch, Push, Pull, Merge, Rebase, Stash)
- Branch Strategy (Trunk Based Development)
- Conventional Commit
- 確定pre-commit有執行
## [DoD] Going Cloud
- New Feature to Going Cloud Example Project
# Minimun IR-based LLM Application(GC-IRLLM) (IR=Information retrival)
## Definition
- 專案完成所需技能, 學完後可以完成組織交付的任務, 尤其實在專案基礎上微調
## DoD
- New Toy Project of GC-IRLLM
## Flow
- [1] 知道IR-based LLM Application概念
- [2] 能在local起專案
- [3] 能call 專案內API 餵資料進去
- [4] 能調整prompt
- [5] 有基礎QA能力
## [1] Going Cloud
- Set up GC-IRLLM Template
## [2] LLM setting (MUST)
- Temperature
- Top P
- Other Hyper parameters
## [2] LLM concept (MUST)
- Types of LLMs
- Pros and Cons of LLMs
## [2] Application Concept
- What's RAG
- Standard RAG flow
## [2] Package / Library / Framework (MUST)
- LangChain
- Azure OpenAI API
- AWS BedRock API
-
## [3] Going Cloud
- GC-IRLLM Data API
- Basic ETL from Raw Data
## [3] Data Chunk Strategy
- Whole
- Character Text Split
## [4] Prompting Techniques
- Role Prompting
- Tasks Prompting (QA/Summary/Generation)
## [4] Functionality
- Question Answering
## Debug
- [5] 知道如何操作GC-IRLLM, 查看檢索到的資料
- [5] 知道基礎實驗記錄方法(excel, o11y service)
## Demo
- [DoD] Demo new toy IRLLM
---
# Basic Developer(團隊最小公約數 (junior))
## Development principle
- Clean Architecture
## Environment Management
- Poetry
- Pyenv (Python版本管理)
- Docker 基礎指令
- Docker compose 基礎指令
## Package / Library / Framework
- Huggingface Transformer
- Pandas
- Numpy
- Scikit-learn
- Pytorch or Tensorflow
## Others
- Error handling
## Linting/Formating Settings
- black
- isort
- Flake8
- pylint
- EditorConfig
## RDBMS tools
- SQLalchemy
- SQL syntax (這個要會)
## API Document
- Swagger (in FastAPI)(不會也可以開發)
## API
- RestfulAPI
# Advanced Developer
## Git
- Git hook
## Profling and Performance Debug
- pyroscope
- Memray
- Viztracer
- profyle
## Optimization skill
- asyncio (async/await)
- Thread
- Multi-process
## Testing
- Integration Testing (high level to API Testing)
- Test Driven Development
- Domain Driven Design
- UI Testing
## Testing
- Unit testing
## Test Tools
- Pytest
# Basic Machine Learning (MLE 的基本素養)
## Task
- Classification
- Clustering
- Regression
## Training Strategy
- Supervise Learning
- Un-Supervise Learning
- Semi-Supervise Learning
## Knowledge
- Machine Learning (基本概念)
- Deep Learning (基本概念)
## Neural Network
- ANN
- VAE
- Transformer (Encoder, Decoder)
## Domain knowledge
- GenAI
## Design and Development Principles
- Golden dataset Desgin
- Experiment Design
# Advanced Machine Learning (MLE 的基本素養)
## Knowledge
- Reinforcement Learning
- Online learning
## deploy skill
- Distilling
- Quantization
## Domain knowledge
- NLP
- RecSys
# Basic IR-based Application
## Definition
- 團隊最小公約數 (junior)
- 能解釋啥是好, 啥是壞
## Application Concept
- Relevant search
- Vector Similarity Function
## Retrieve Sources (會打基本的 API)
- Vector DB (OpenSearch/ElasticSearch)
- Document DB (OpenSearch/ElasticSearch)
- Files (PDF, ppt ...etc)
## Tools
- opensearch-py/elasticsearhc-py
## Retrieve Strategy
- Single
- Fusion
## Generation Strategy
- text
- streaming
- prefix, postfix
## Data Chunk Strategy
- Whole
- Character Text Split
## Functionality
- Summarization
- Question Answering
# Advance IR-based Application
## ElasticSearh / OpenSearch (No)
- Vector Search
- Text Search
- Index
- Term Query
- String Query
- Analyzer
- Keywords
## LLM usage concept
- Agents
- Tools
- Characters
## IR Concept
- RAG
- REALM
- RETRO
## NLP Concept
- NER
- NLU
- QA
## Package / Library / Framework
- SemeticKernel (??? 為什麼要會這個)
## Retrieve Sources
- NoSQL (DynamoDB, levelDB, Redis)
- SQL (DuckDB)
- Graph DB (neo4j/arrangodb)
## Data Chunk Strategy
- Document QA generation
- Parent-Child
- Knowledge Graph
## Prompting Techniques
- Few Shot Learning Prompting
- Chain of Thought
- Zero Shot Learning Prompting
- Chain of Validation
- Chain of Note
- Tree of Thoughts
- Rephrase and Respond
- Personalized (NLU, status) (組合型的 prompt 來達到個人化的效果)
## Retrieve Strategy
- [Rerank] Reciprocal Rank Fusion
- [Rerank] Cohere
## Generation Strategy
- chunk streaming (研究中)
## Functionality
- Summarization
- Multiple rounds of Dialog
- Content Generation
## Evaluation
- Context Relevancy(Recall/Precision/F1)
- Generated Answer Relevancy
## Evaluation Tools
- Ragas
- promptfoo
## LLMOps
- Prompt Versioning
- Index Versioning
- Data Versioning
## LLMOps Tools
- [Experiment]MLflow
- pezzo/BentoML
- Grafana + Prometheus
- LIDA
- llmonitor (langfuse?)
## Networks
- CORS
- HTTPS
- OAuth
- JWT
## Others
- gunicorn/unicorn