# 🌟 **Vision Transformer (ViT) Tutorial – Part 6: Vision Transformers in Production – MLOps, Monitoring & CI/CD** **#MLOps #ModelMonitoring #CIforML #MLflow #WandB #Kubeflow #ProductionAI #DeepLearning #ComputerVision #Transformers #AIOps** --- ## πŸ”Ή **Table of Contents** 1. [Recap of Part 5](#recap-of-part-5) 2. [The Gap Between Research and Production](#the-gap-between-research-and-production) 3. [What is MLOps? Why It Matters for Vision Transformers](#what-is-mlops-why-it-matters-for-vision-transformers) 4. [Production Challenges: Latency, Drift, Failures](#production-challenges-latency-drift-failures) 5. [Model Monitoring: Accuracy, Latency, Throughput](#model-monitoring-accuracy-latency-throughput) 6. [Data & Concept Drift Detection in Vision Models](#data--concept-drift-detection-in-vision-models) 7. [Logging Predictions & Embeddings with Weights & Biases](#logging-predictions--embeddings-with-weights--biases) 8. [Using MLflow for Model Tracking & Experimentation](#using-mlflow-for-model-tracking--experimentation) 9. [CI/CD for ML: Testing, Versioning, Rollback](#cicd-for-ml-testing-versioning-rollback) 10. [A/B Testing & Canary Rollouts for ViT Models](#ab-testing--canary-rollouts-for-vit-models) 11. [Model Registry & Lineage with MLflow & DVC](#model-registry--lineage-with-mlflow--dvc) 12. [Anomaly Detection in Predictions](#anomaly-detection-in-predictions) 13. [Multi-Model Serving with KServe & BentoML](#multi-model-serving-with-kserve--bentoml) 14. [Scaling ViT with Kubernetes & Kubeflow](#scaling-vit-with-kubernetes--kubeflow) 15. [Security & Compliance in Production AI](#security--compliance-in-production-ai) 16. [Case Study: Deploying ViT in a Real-Time Video Pipeline](#case-study-deploying-vit-in-a-real-time-video-pipeline) 17. [Common Pitfalls & Best Practices](#common-pitfalls--best-practices) 18. [Visualizing MLOps Pipeline (Diagram)](#visualizing-mlops-pipeline-diagram) 19. [Summary & What’s Next in Part 7](#summary--whats-next-in-part-7) --- ## πŸ” **1. Recap of Part 5** In **Part 5**, we mastered **efficient Vision Transformers**: - Learned why **efficiency** is critical for edge and mobile. - Explored **MobileViT**, **TinyViT**, **PVT**, **Swin-T**, and **LeViT**. - Applied **knowledge distillation**, **quantization**, and **pruning**. - Exported models to **ONNX** for cross-platform use. - Accelerated inference with **TensorRT**. - Deployed models using **TorchServe** and **FastAPI**. - Benchmarked **accuracy vs latency vs size**. Now, in **Part 6 β€” the most comprehensive and practical yet** β€” we shift from **deployment** to **operations**. You’ll learn how to run ViT models in **production** like a pro β€” with **monitoring, CI/CD, A/B testing, rollback, and MLOps**. This is where **research meets reality**. Let’s dive in! --- ## 🧩 **2. The Gap Between Research and Production** In research, you: - Train on clean datasets. - Report top-1 accuracy. - Ignore latency, cost, and failures. In production, you: - Deal with **noisy, real-world data**. - Face **model drift** and **hardware failures**. - Must ensure **reliability, scalability, and compliance**. > πŸ’‘ **"The best model in the world is useless if it breaks in production."** ### πŸ”Ή The "Last Mile" Problem | Research Phase | Production Reality | |--------------|-------------------| | High accuracy on ImageNet | Poor performance on user-uploaded images | | Single model | Multiple versions in production | | Manual testing | Automated CI/CD pipelines | | No monitoring | 24/7 observability required | | One-time deployment | Continuous updates | > βœ… Bridging this gap is the job of **MLOps**. --- ## πŸ› οΈ **3. What is MLOps? Why It Matters for Vision Transformers** **MLOps (Machine Learning Operations)** is the practice of **applying DevOps principles to ML systems**. It includes: - **Version control** for data, code, and models. - **Automated testing** of ML pipelines. - **CI/CD** for models. - **Monitoring** in production. - **Rollback** mechanisms. - **Governance & compliance**. ### πŸ”Ή Why MLOps for ViT? Vision Transformers are: - **Large and expensive** to run. - **Sensitive to data drift** (e.g., new camera types). - **Used in critical apps** (medical, autonomous vehicles). - **Updated frequently** (new features, bug fixes). > βœ… Without MLOps, you risk: > - Silent model degradation > - Long downtime > - Regulatory violations --- ## ⚠️ **4. Production Challenges: Latency, Drift, Failures** ### πŸ”Ή **1. Latency Spikes** ViT models can slow down due to: - GPU memory pressure - Batch size changes - Network congestion > βœ… Must monitor **P99 latency**, not just average. --- ### πŸ”Ή **2. Data Drift** Input data changes over time: - New lighting conditions - Different camera resolutions - Seasonal variations (e.g., winter vs summer) Example: > A ViT trained on **daytime images** fails at **night**. --- ### πŸ”Ή **3. Concept Drift** The relationship between input and output changes: - "Cat" now includes robotic pets - New product categories in e-commerce --- ### πŸ”Ή **4. Model Failures** - GPU OOM (Out of Memory) - ONNX export bugs - Quantization artifacts - Corrupted model files > βœ… Need **automated alerts and rollback**. --- ## πŸ“Š **5. Model Monitoring: Accuracy, Latency, Throughput** ### βœ… Key Metrics to Monitor | Metric | Tool | Alert If | |------|------|---------| | **Prediction Latency** | Prometheus | P99 > 100ms | | **Request Rate** | Grafana | Sudden drop/spike | | **Error Rate** | ELK Stack | > 1% | | **GPU Utilization** | nvidia-smi | > 90% sustained | | **Model Accuracy** | Custom logging | Drop > 5% | | **Throughput (QPS)** | In-house dashboard | Below baseline | --- ### βœ… Example: Monitoring with Prometheus + Grafana ```python from prometheus_client import Counter, Histogram # Define metrics PREDICTION_LATENCY = Histogram('prediction_latency_seconds', 'Latency of ViT predictions') PREDICTION_COUNT = Counter('predictions_total', 'Total predictions made') ERROR_COUNT = Counter('prediction_errors_total', 'Total prediction errors') # In inference function start_time = time.time() try: output = model(input) PREDICTION_LATENCY.observe(time.time() - start_time) PREDICTION_COUNT.inc() except Exception as e: ERROR_COUNT.inc() raise ``` > βœ… Expose `/metrics` endpoint for Prometheus scraping. --- ## πŸ”„ **6. Data & Concept Drift Detection in Vision Models** ### πŸ”Ή **Data Drift**: Input distribution changes Detect using: - **Embedding similarity** (cosine distance between batches) - **Pixel statistics** (mean, variance) - **Feature extractor outputs** ```python def detect_drift(embeddings_old, embeddings_new, threshold=0.1): mean_old = embeddings_old.mean(0) mean_new = embeddings_new.mean(0) distance = 1 - F.cosine_similarity(mean_old, mean_new, dim=0) return distance > threshold ``` > βœ… Use **PCA or UMAP** to visualize drift. --- ### πŸ”Ή **Concept Drift**: Label meaning changes Harder to detect without labels. Use: - **Confidence monitoring**: Drop in mean softmax score. - **Human-in-the-loop**: Flag low-confidence predictions for review. - **Shadow mode**: Run new model in parallel, compare outputs. --- ## πŸ§ͺ **7. Logging Predictions & Embeddings with Weights & Biases** [**Weights & Biases (W&B)**](https://wandb.ai) is a powerful tool for **experiment tracking and model monitoring**. ### βœ… Log Predictions ```python import wandb wandb.init(project="vit-production") for batch in dataloader: outputs = model(batch) preds = torch.argmax(outputs, dim=1) wandb.log({ "latency": time.time() - start, "confidence": torch.softmax(outputs, dim=1).max(dim=1)[0].mean(), "predictions": wandb.Table( columns=["image", "pred", "prob"], data=[[wandb.Image(img), pred, prob] for img, pred, prob in zip(batch, preds, outputs)] ) }) ``` > βœ… Visualize predictions, attention maps, and drift over time. --- ### βœ… Log Embeddings for Drift Detection ```python embeddings = model.get_embeddings(batch) wandb.log({"embeddings": wandb.Object3D(embeddings.cpu().numpy())}) ``` Use W&B’s **embedding projector** to visualize clusters. --- ## πŸ“¦ **8. Using MLflow for Model Tracking & Experimentation** [**MLflow**](https://mlflow.org) is an open-source platform for **ML lifecycle management**. ### βœ… Log Experiment ```python import mlflow import mlflow.pytorch mlflow.set_experiment("ViT-Optimization") with mlflow.start_run(): mlflow.log_params({ "model": "MobileViT-S", "lr": 1e-4, "batch_size": 64, "quantization": "int8" }) # Train... accuracy = evaluate(model) mlflow.log_metric("accuracy", accuracy) mlflow.log_metric("latency", avg_latency) # Save model mlflow.pytorch.log_model(model, "model") ``` > βœ… Compare runs, track accuracy vs latency. --- ### βœ… Load Model in Production ```python model_uri = "runs:/<run_id>/model" model = mlflow.pytorch.load_model(model_uri) ``` > βœ… No code duplication between training and inference. --- ## πŸ” **9. CI/CD for ML: Testing, Versioning, Rollback** ### βœ… CI/CD Pipeline for ViT ``` Code Commit β†’ Run Tests β†’ Train Model β†’ Evaluate β†’ Log to MLflow ↓ If metrics OK β†’ Export to ONNX β†’ Run Integration Tests ↓ If all pass β†’ Deploy to Staging β†’ A/B Test ↓ If success β†’ Deploy to Production ↓ Monitor β†’ Alert on Drift β†’ Rollback if needed ``` --- ### βœ… Automated Testing | Test | Description | |------|-------------| | **Unit Test** | Check model forward pass | | **Integration Test** | ONNX export, TensorRT conversion | | **Accuracy Test** | Ensure accuracy drop < 1% after optimization | | **Latency Test** | Ensure P99 < 100ms | | **Drift Test** | Compare embeddings on validation vs live data | --- ### βœ… Rollback Strategy If new model fails: 1. **Detect failure** (latency spike, accuracy drop). 2. **Switch traffic** back to previous version. 3. **Investigate** root cause. 4. **Fix and re-deploy**. > βœ… Use **TorchServe model versioning** or **Kubernetes rollouts**. --- ## πŸ”€ **10. A/B Testing & Canary Rollouts for ViT Models** ### πŸ”Ή **A/B Testing** Split traffic: - 50% β†’ Old model (A) - 50% β†’ New model (B) Compare: - Accuracy (if labels available) - Latency - User engagement ```python if user_id % 2 == 0: prediction = model_a(image) else: prediction = model_b(image) ``` > βœ… Safe way to validate improvements. --- ### πŸ”Ή **Canary Rollout** Gradually increase traffic to new model: - Day 1: 1% - Day 2: 5% - Day 3: 25% - Day 4: 100% Monitor at each step. > βœ… Minimizes blast radius of failures. --- ## 🧩 **11. Model Registry & Lineage with MLflow & DVC** ### βœ… Model Registry (MLflow) Track model stages: - `Staging` β†’ `Production` β†’ `Archived` ```python client = mlflow.tracking.MlflowClient() client.transition_model_version_stage( name="ViT-Classifier", version=3, stage="Production" ) ``` > βœ… Enforce approval workflows. --- ### βœ… Data Version Control (DVC) Version datasets and models: ```bash # Track data dvc add data/imagenet_train git add data/imagenet_train.dvc # Track model dvc add models/vit_quantized.pth git add models/vit_quantized.pth.dvc ``` > βœ… Reproducible pipelines. --- ## 🚨 **12. Anomaly Detection in Predictions** Detect weird behavior: - Sudden drop in confidence - All predictions the same - Out-of-distribution inputs ### βœ… Example: Outlier Detection with Embeddings ```python from sklearn.ensemble import IsolationForest # Fit on training embeddings isolation_forest = IsolationForest(contamination=0.1) isolation_forest.fit(train_embeddings) # Detect outliers in live data scores = isolation_forest.decision_function(live_embeddings) anomalies = scores < threshold ``` > βœ… Flag suspicious inputs for review. --- ## πŸ”„ **13. Multi-Model Serving with KServe & BentoML** ### βœ… **KServe (formerly KFServing)** Kubernetes-native serving for multiple frameworks. ```yaml apiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: name: vit-classifier spec: predictor: model: modelFormat: name: pytorch storageUri: s3://models/vit-v2 ``` > βœ… Scales automatically, supports canary rollouts. --- ### βœ… **BentoML** Build model serving APIs: ```python import bentoml from bentoml.io import Image, NumpyNdarray runner = bentoml.pytorch.load_runner("vit_model:latest") svc = bentoml.Service("vit_classifier", runners=[runner]) @svc.api(input=Image(), output=NumpyNdarray()) def classify(image): return runner.run(image) ``` Deploy with: ```bash bentoml serve service.py:svc ``` > βœ… One-click deployment to AWS, GCP, Azure. --- ## 🌐 **14. Scaling ViT with Kubernetes & Kubeflow** ### βœ… Kubernetes for Scaling - Deploy ViT as a **Deployment** with HPA (Horizontal Pod Autoscaler). - Use **GPU nodes** for acceleration. - **Istio** for traffic management. ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: vit-inference spec: replicas: 3 template: spec: containers: - name: vit image: my-vit-model:latest resources: limits: nvidia.com/gpu: 1 ``` --- ### βœ… Kubeflow for End-to-End Pipelines Kubeflow Pipelines let you orchestrate: - Data preprocessing - Training - Evaluation - Deployment > βœ… Full MLOps workflow on Kubernetes. --- ## πŸ” **15. Security & Compliance in Production AI** ### βœ… Key Concerns | Issue | Mitigation | |------|-----------| | **Model Theft** | Encrypt models, use secure serving | | **Data Privacy** | Anonymize inputs, comply with GDPR | | **Adversarial Attacks** | Add input validation, use adversarial training | | **Bias & Fairness** | Audit model across demographics | | **Regulatory Compliance** | SOC2, HIPAA, ISO 27001 | > βœ… Never log raw user images without consent. --- ## πŸ§ͺ **16. Case Study: Deploying ViT in a Real-Time Video Pipeline** ### πŸ”Ή Use Case: Smart Retail Store Detect customer demographics and behavior in real time. ### βœ… Architecture ``` RTSP Cameras β†’ Video Decoder β†’ Frame Sampler ↓ ViT (MobileViT-S) β†’ Age/Gender/Emotion ↓ Kafka β†’ Analytics Dashboard ↓ Alert System (e.g., loitering) ``` ### βœ… MLOps Setup - **Monitoring**: Prometheus + Grafana for latency. - **Drift Detection**: W&B for embedding drift. - **CI/CD**: GitHub Actions β†’ train on new data weekly. - **Rollback**: TorchServe model versioning. - **Security**: On-premise GPU servers, no cloud. > βœ… Runs at **30 FPS** on 4x RTX 3090. --- ## ⚠️ **17. Common Pitfalls & Best Practices** ### ❌ **Pitfall 1: No Monitoring** "Works fine" until it doesn’t. βœ… **Fix**: Monitor **latency, errors, drift**. --- ### ❌ **Pitfall 2: Ignoring Data Drift** Model trained on studio images fails on real-world photos. βœ… **Fix**: Log embeddings, detect drift. --- ### ❌ **Pitfall 3: Manual Deployments** "Let me ssh into the server..." βœ… **Fix**: Use **CI/CD pipelines**. --- ### ❌ **Pitfall 4: No Rollback Plan** Broken model β†’ hours of downtime. βœ… **Fix**: Use **canary rollouts** and **versioned serving**. --- ### βœ… **Best Practices** - Use **MLflow** for tracking. - Log to **W&B** for visualization. - Serve with **KServe or BentoML**. - Monitor **24/7**. - Automate **testing and deployment**. --- ## πŸ–ΌοΈ **18. Visualizing MLOps Pipeline (Diagram)** ``` +----------------+ +-------------------+ | Code & Data | --> | CI/CD Pipeline | | (Git, DVC) | | (Test, Train) | +----------------+ +---------+---------+ | v +--------------v--------------+ | MLflow Tracking | | (Experiments, Models, Metrics)| +--------------+--------------+ | v +--------------v--------------+ | Model Registry (Staging) | +--------------+--------------+ | v +--------------------v--------------------+ | A/B Test or Canary Rollout | +--------------------+--------------------+ | +------------------------+------------------------+ | | | +--------v--------+ +---------v----------+ +----------v----------+ | Old Model (A) | | New Model (B) | | Shadow Mode (Log) | +-----------------+ +--------------------+ +---------------------+ | | | +------------------------+------------------------+ | v +--------------v--------------+ | Production Serving | | (KServe, TorchServe, BentoML) | +--------------+--------------+ | v +--------------v--------------+ | Monitoring & Alerts | | (Prometheus, W&B, ELK) | +--------------+--------------+ | v +--------------v--------------+ | Drift Detection & Rollback | +-------------------------------+ ``` > πŸ” This is a **production-grade MLOps pipeline** for Vision Transformers. --- ## 🏁 **19. Summary & What’s Next in Part 7** ### βœ… **What You’ve Learned in Part 6** - The **gap between research and production**. - **MLOps** principles for Vision Transformers. - How to **monitor models** for latency, drift, and failures. - Use **MLflow** and **Weights & Biases** for tracking. - Build **CI/CD pipelines** for ML. - Perform **A/B testing** and **canary rollouts**. - Manage **model registry and lineage**. - Detect **anomalies** and **outliers**. - Serve models with **KServe, BentoML, TorchServe**. - Scale with **Kubernetes and Kubeflow**. - Ensure **security and compliance**. --- ### πŸ”œ **What’s Coming in Part 7: The Future of Vision Transformers – Multimodal, 3D, and Beyond** In the final part, we’ll explore: - 🧠 **Multimodal Transformers**: CLIP, Flamingo, PaLM-E. - 🌍 **3D Vision Transformers**: For point clouds and meshes. - πŸŽ₯ **Video & Temporal Modeling**: TimeSformer, ViViT. - πŸ€– **Embodied AI**: Robots using ViT for navigation. - 🧬 **Medical Vision Transformers**: For pathology and radiology. - πŸš€ **Next-Gen Architectures**: Mamba, RetNet, and alternatives to attention. - 🌐 **Vision Transformers in Web & AR/VR**. > πŸ“Œ **#MultimodalAI #3DViT #TimeSformer #PaLME #MedicalAI #FutureOfAI** --- ## πŸ™Œ Final Words You’ve just completed the **most in-depth exploration of production Vision Transformers** ever created. > πŸ’¬ **"In the real world, the model is just the beginning. The system is what matters."** You now know how to take a ViT model from **research notebook** to **24/7 production system** β€” with monitoring, CI/CD, rollback, and scalability. In **Part 7**, we’ll look to the **future** β€” where Vision Transformers merge with language, 3D, robotics, and medicine. --- πŸ“Œ **Pro Tip**: Document your MLOps pipeline. Your future self will thank you. πŸ” **Share this epic guide** with your team β€” it’s the ultimate playbook for **real-world AI**. --- βœ… **You're now ready for Part 7!** The final chapter: **The Future of Vision Transformers**. #MLOps #ModelMonitoring #CIforML #MLflow #WandB #Kubeflow #ProductionAI #DeepLearning #ComputerVision #Transformers #AIOps