This report presents a detailed analysis of 100 AI models available in 2025, representing the most comprehensive comparison of cloud-based and locally-deployable solutions. The research reveals a transformed landscape where open-source models rival proprietary offerings, costs have plummeted 285x since 2022, and consumer hardware can now run sophisticated 70B parameter models through advanced quantization.
Model Name | Provider/Developer | Description | Context Window | Cost (per 1M tokens) | Model Size | Local/Cloud | Hardware Requirements | Model Modes | Tags | Access URL | Access Requirements | MMLU | HumanEval | MT-Bench/Arena | GPQA | SWE-bench |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GPT-4.1 | OpenAI | Latest flagship with extensive world knowledge and agentic planning | 1M tokens | $10 input / $40 output | Not disclosed | Cloud | N/A | Text, vision | flagship, multimodal, reasoning, general-purpose, vision, planning, creative, enterprise, production, latest | platform.openai.com | API key required | 90.2% | - | ~1350 Elo | - | 54.6% |
GPT-4o | OpenAI | Multimodal omni model, 2x faster than GPT-4 Turbo | 128K tokens | $2.50 input / $10 output | Not disclosed | Cloud | N/A | Text, vision, audio (planned) | multimodal, fast, cost-effective, vision, audio-planned, general-purpose, enterprise, api, production, established | platform.openai.com | API key required | 88.7% | 90.2% | ~1350 Elo | - | - |
o3 | OpenAI | Advanced reasoning for coding, math, science | 200K tokens | $10 input / $40 output | Not disclosed | Cloud | N/A | Text, vision, code, reasoning | reasoning, coding, math, science, research, extended-thinking, problem-solving, logic, advanced, flagship | ChatGPT Pro/Plus, API | Subscription required | - | - | High | 83.3% | - |
o3-mini | OpenAI | Fast, cost-efficient reasoning for STEM | 200K tokens | $0.55 input / $4.40 output | Not disclosed | Cloud | N/A | Text, coding, math | reasoning, efficient, stem, coding, math, cost-effective, fast, accessible, education, research | ChatGPT, API | Free tier available | - | - | - | - | - |
Claude Opus 4 | Anthropic | Most powerful coding model with sustained performance | 200K tokens | $15 input / $75 output | Not disclosed | Cloud | N/A | Text, vision, tools | coding, reasoning, enterprise, flagship, agentic, tool-use, premium, high-performance, professional, advanced | claude.ai, API | Pro subscription | - | - | High | - | 72.5% |
Claude Sonnet 4 | Anthropic | High-performance model with exceptional reasoning | 200K tokens | $3 input / $15 output | Not disclosed | Cloud | N/A | Text, vision, tools | reasoning, balanced, cost-effective, general-purpose, tool-use, production, enterprise, versatile, popular, reliable | claude.ai, API | API key | - | - | High | - | 72.7% |
Claude 3.7 Sonnet | Anthropic | First hybrid reasoning model with extended thinking | 200K tokens | $3 input / $15 output | Not disclosed | Cloud | N/A | Text, vision, thinking modes | reasoning, hybrid, extended-thinking, agentic, coding, innovative, flexible, research, analysis, advanced | claude.ai, API | API key | 85% | - | Leading | 84.8% | 62.3% |
Gemini 2.5 Pro | Advanced reasoning with built-in thinking capabilities | 1M tokens | $1.25-$2.50 input / $10-$15 output | Not disclosed | Cloud | N/A | Text, images, audio, video, code | reasoning, multimodal, long-context, enterprise, video, audio, comprehensive, google, production, flagship | ai.google.dev | Google account | 85.8% | - | - | 84.0% | 63.8% | |
Gemini 2.5 Flash | Fast thinking model optimized for speed | 1M tokens | TBD (preview) | Not disclosed | Cloud | N/A | Text, images, audio, video, code | fast, efficient, multimodal, preview, cost-effective, google, production-ready, versatile, quick, responsive | ai.google.dev | Google account | - | - | - | - | - | |
Grok 3 | xAI | Flagship reasoning with Think and Big Brain modes | 1M tokens | $3 input / $15 output | 2.7T parameters | Cloud | N/A | Text, images | reasoning, real-time, uncensored, x-integration, think-mode, truth-seeking, research, analysis, social, web-aware | grok.x.ai | X Premium+ | 79.9% | - | - | 75.4% | - |
Llama 3.3 70B | Meta | Performance comparable to 405B model | 128K tokens | $0.59 input / $0.70 output | 70B | Both | 35GB VRAM (Q4) | Text | open-source, efficient, multilingual, high-performance, cost-effective, local-capable, enterprise, research, production, popular | llama.meta.com | Download/API | 86.0% | 88.4% | - | 51.1% | - |
Llama 3.2 90B Vision | Meta | First multimodal Llama with vision | 128K tokens | Free/API pricing | 90B | Both | 90GB VRAM (Q4) | Text, vision | multimodal, vision, open-source, large, research, computer-vision, image-understanding, meta, advanced, flagship | llama.meta.com | Download required | - | - | - | - | - |
Llama 3.1 405B | Meta | Largest open model with extensive capabilities | 128K tokens | Higher tier pricing | 405B | Both | Multi-GPU setup | Text | flagship, open-source, massive, research, high-performance, sota, enterprise, meta, comprehensive, powerful | llama.meta.com | Download/cloud | 88.6% | 89.0% | - | - | - |
Mistral Medium 3 | Mistral | Multimodal frontier model for enterprise | 128K tokens | $0.40 input / $2.00 output | 20-50B est. | Both | 4+ GPUs | Text, vision | multimodal, enterprise, cost-effective, vision, french, european, production, balanced, efficient, popular | mistral.ai | API key | ~85% | ~85% | - | - | - |
Mixtral 8x22B | Mistral | Sparse MoE with 176B total parameters | 65K tokens | Free/API pricing | 176B (39B active) | Both | 90GB VRAM (Q4) | Text | moe, open-source, efficient, large, sparse, research, french, innovative, scalable, powerful | mistral.ai | Download/API | - | - | - | - | - |
DeepSeek-R1 7B | DeepSeek | First-gen reasoning model comparable to o1 | 32K-128K tokens | Free/open | 7B | Local | RTX 3060 (8GB) | Text, reasoning | reasoning, open-source, efficient, chinese, innovative, research, logic, math, accessible, popular | ollama.com/library | ollama pull deepseek-r1:7b | - | - | - | - | - |
Qwen 3 7B | Alibaba | Latest gen with multilingual support | 128K tokens | Free/open | 7B | Local | RTX 3070 (8GB) | Text | multilingual, open-source, efficient, chinese, versatile, long-context, production, popular, accessible, comprehensive | ollama.com/library | ollama pull qwen2.5:7b | - | - | - | - | - |
Phi-4 14B | Microsoft | State-of-the-art reasoning in compact size | 128K tokens | Free/open | 14B | Local | RTX 3090 (16GB+) | Text | reasoning, efficient, microsoft, math, compact, powerful, research, accessible, innovative, popular | ollama.com/library | ollama pull phi4:14b | - | - | - | - | - |
CodeLlama 34B | Meta | Specialized for code generation | 16K tokens | Free/open | 34B | Local | RTX 4090 (24GB) | Code, text | coding, open-source, fill-in-middle, meta, specialized, development, programming, tool, popular, efficient | ollama.com/library | ollama pull codellama:34b | - | ~81% | - | - | - |
DeepSeek-Coder V2 16B | DeepSeek | MoE code model, GPT4-Turbo comparable | 128K tokens | Free/open | 16B | Local | RTX 4070+ (12GB) | Code | coding, moe, chinese, high-performance, specialized, development, efficient, innovative, popular, advanced | ollama.com/library | ollama pull deepseek-coder-v2:16b | - | High | - | - | - |
Dolphin 3.0 8B | Eric Hartford | Uncensored ultimate general purpose | 128K tokens | Free/open | 8B | Local | RTX 3070+ (8GB) | Text | uncensored, general-purpose, versatile, community, creative, unrestricted, popular, flexible, agentic, function-calling | ollama.com/library | ollama pull dolphin3:8b | - | - | - | - | - |
Hermes 3 70B | Nous Research | Flagship uncensored, highly steerable | 128K tokens | Free/open | 70B | Local | Dual RTX 4090 | Text | uncensored, steerable, creative, research, community, powerful, flexible, advanced, flagship, popular | ollama.com/library | ollama pull hermes3:70b | - | - | - | - | - |
TinyLlama 1.1B | Zhang et al. | Compact model trained on 3T tokens | 4K tokens | Free/open | 1.1B | Local | RTX 3050+ (4GB) | Text | efficient, mobile, edge, tiny, accessible, lightweight, fast, community, research, popular | ollama.com/library | ollama pull tinyllama:1.1b | - | - | - | - | - |
SmolLM2 1.7B | Hugging Face | Ultra-compact with tool support | 8K tokens | Free/open | 1.7B | Local | RTX 3050+ (4GB) | Text, tools | ultra-compact, efficient, tools, huggingface, lightweight, accessible, innovative, mobile, edge, community | ollama.com/library | ollama pull smollm2:1.7b | - | - | - | - | - |
LLaVA 1.6 34B | Microsoft/UW | Vision + language understanding | 8K tokens | Free/open | 34B | Local | RTX 4090 (24GB) | Vision, text | multimodal, vision, open-source, research, ocr, image-understanding, microsoft, academic, powerful, innovative | ollama.com/library | ollama pull llava:34b | - | - | - | - | - |
Command A | Cohere | Most performant Cohere model, 150% faster | 256K tokens | Enterprise pricing | Not disclosed | Cloud | N/A | Text | enterprise, fast, multilingual, rag, tool-use, canadian, production, agentic, commercial, advanced | api.cohere.ai | API key | - | - | - | - | - |
Command R+ 08-2024 | Cohere | Scalable LLM for enterprise use | 128K tokens | $3 input / $15 output | Not disclosed | Cloud | N/A | Text | enterprise, rag, tool-use, multilingual, canadian, scalable, production, commercial, reliable, established | api.cohere.ai | API key | - | - | - | - | - |
Jamba 1.6 Large | AI21 Labs | Hybrid SSM-Transformer architecture | 256K tokens | $3.50/1M blended | Not disclosed | Cloud | N/A | Text | hybrid, long-context, efficient, israeli, innovative, enterprise, research, unique, commercial, advanced | AI21 Studio API | API key | - | - | - | - | - |
Sonar Pro | Perplexity | Advanced search with deep understanding | Not specified | $5/1K searches + token costs | Not disclosed | Cloud | N/A | Text, search | search, real-time, citations, research, factual, web-aware, advanced, premium, commercial, specialized | docs.perplexity.ai | API key | - | - | - | - | - |
Amazon Nova Pro | Amazon | Multimodal with 1M token context | 1M tokens | $0.008 input / $0.032 output | Not disclosed | Cloud | N/A | Text, image, video | multimodal, aws, cost-effective, video, enterprise, amazon, cloud-native, scalable, production, comprehensive | Amazon Bedrock | AWS account | - | - | - | - | - |
Reka Core | Reka AI | True multimodal transformer | 128K tokens | $2.00/1M tokens | 67B | Cloud | N/A | Text, image, audio, video | multimodal, comprehensive, startup, innovative, research, all-modalities, advanced, unique, commercial, specialized | api.reka.ai | API key | - | - | - | - | - |
FLUX.1.1 Pro | Black Forest Labs | State-of-the-art image generation | N/A | $0.04/image | 12B | Cloud | N/A | Image generation | image-gen, fast, high-quality, commercial, german, innovative, sota, production, api, popular | bfl.ml, replicate.com | API key | - | - | - | - | - |
Stable Diffusion 3.5 Large | Stability AI | High-quality text-to-image, 2048x2048 | 256-512 tokens | Free <$1M revenue | 12B | Both | 16GB+ VRAM | Image generation | image-gen, open-source, high-res, community, british, accessible, popular, local-capable, commercial, established | Stability AI API | API/Download | - | - | - | - | - |
DALL-E 3 | OpenAI | Integrated text-to-image generation | 4000 chars | $0.04-0.08/image | Not disclosed | Cloud | N/A | Image generation | image-gen, integrated, high-quality, openai, commercial, popular, chatgpt, api, production, established | platform.openai.com | API key | - | - | - | - | - |
Whisper large-v3 | OpenAI | Multilingual speech recognition | 30-sec segments | $0.006/minute | 1.55B | Both | 10GB VRAM | Speech-to-text | speech, multilingual, open-source, transcription, accurate, popular, production, accessible, openai, established | platform.openai.com | API/Download | - | - | - | - | - |
ElevenLabs Turbo v2.5 | ElevenLabs | High-quality, low latency TTS | N/A | 0.5 credits/char | Not disclosed | Cloud | N/A | Text-to-speech | tts, voice-cloning, multilingual, commercial, high-quality, fast, production, popular, innovative, professional | elevenlabs.io | Subscription | - | - | - | - | - |
GitHub Copilot (GPT-4.1) | GitHub/OpenAI | Code completion and assistance | 8K tokens | $20-39/month | Not disclosed | Cloud | N/A | Code, chat | coding, ide-integration, commercial, popular, microsoft, development, productivity, established, professional, tool | github.com/copilot | Subscription | - | - | - | - | - |
StarCoder2 15B | HuggingFace/ServiceNow | Transparent code model, 600+ languages | 16K tokens | Free/open | 15B | Local | 30GB VRAM | Code | coding, open-source, transparent, multilingual, research, community, development, accessible, comprehensive, innovative | huggingface.co | Download | - | ~40% | - | - | - |
Pi (Inflection-2.5) | Inflection AI | Conversational AI with emotional intelligence | Not specified | Freemium | Not disclosed | Cloud | N/A | Text | conversational, emotional, personal, consumer, friendly, accessible, unique, empathetic, casual, assistant | pi.ai | Free/Premium | - | - | - | - | - |
GPT-3.5 Turbo | OpenAI | Cost-effective dialog model | 16K tokens | $0.50 input / $1.50 output | Not disclosed | Cloud | N/A | Text | cost-effective, dialog, established, openai, production, accessible, popular, api, reliable, legacy | platform.openai.com | API key | - | - | - | - | - |
Gemini Nano | On-device deployment model | Mobile-optimized | Free (on-device) | Optimized | Local | Android/Pixel | Text, images | mobile, edge, privacy, google, lightweight, accessible, on-device, efficient, android, consumer | Android SDK | Device compatible | - | - | - | - | - | |
Phi-3 Mini 3.8B | Microsoft | Lightweight with 128K context | 128K tokens | Free/open | 3.8B | Local | RTX 3060+ (6GB) | Text | efficient, long-context, microsoft, lightweight, accessible, reasoning, mobile, edge, popular, compact | ollama.com/library | ollama pull phi3:mini | 80.1% | - | - | - | - |
WizardCoder 33B | WizardLM | State-of-the-art code generation | 8K tokens | Free/open | 33B | Local | RTX 4090 (24GB) | Code | coding, high-performance, specialized, community, development, advanced, open-source, popular, tool, powerful | ollama.com/library | ollama pull wizardcoder:33b | - | High | - | - | - |
SQLCoder 15B | Defog | Natural language to SQL | 8K tokens | Free/open | 15B | Local | RTX 3090+ (16GB) | SQL, code | sql, database, specialized, tool, development, conversion, niche, efficient, targeted, professional | ollama.com/library | ollama pull sqlcoder:15b | - | - | - | - | - |
BakLLaVA 7B | SkunkworksAI | Mistral-based vision model | 8K tokens | Free/open | 7B | Local | RTX 3070+ (8GB) | Vision, text | multimodal, mistral-based, vision, community, efficient, innovative, accessible, lightweight, research, experimental | ollama.com/library | ollama pull bakllava:7b | - | - | - | - | - |
MiniCPM-V 8B | OpenBMB | Efficient vision-language model | 8K tokens | Free/open | 8B | Local | RTX 3070+ (8GB) | Vision, text | multimodal, efficient, chinese, vision-language, lightweight, research, accessible, innovative, community, compact | ollama.com/library | ollama pull minicpm-v:8b | - | - | - | - | - |
Moondream 1.8B | Vikhyat Korrapati | Small vision model for edge | 4K tokens | Free/open | 1.8B | Local | RTX 3050+ (4GB) | Vision, text | multimodal, edge, compact, vision, lightweight, mobile, accessible, efficient, community, innovative | ollama.com/library | ollama pull moondream:1.8b | - | - | - | - | - |
Yi 1.5 34B | 01.AI | High-performing bilingual model | 32K tokens | Free/open | 34B | Local | RTX 4090 (24GB) | Text | bilingual, chinese, high-performance, open-source, research, powerful, accessible, community, asian, advanced | ollama.com/library | ollama pull yi:34b | - | - | - | - | - |
StableLM2 12B | Stability AI | Multilingual efficient model | 16K tokens | Free <$1M revenue | 12B | Local | RTX 3080+ (12GB) | Text | multilingual, efficient, stable, british, open-source, european, accessible, community, balanced, production | huggingface.co | Download | - | - | - | - | - |
Nous-Hermes 2 34B | Nous Research | Scientific and coding focused | 32K tokens | Free/open | 34B | Local | RTX 4090 (24GB) | Text | uncensored, scientific, technical, research, community, specialized, powerful, open-source, advanced, professional | ollama.com/library | ollama pull nous-hermes2:34b | - | - | - | - | - |
OpenHermes 2.5 7B | Teknium | GPT-4 quality instructions | 8K tokens | Free/open | 7B | Local | RTX 3060+ (8GB) | Text | uncensored, chat, instruction-following, community, quality, accessible, popular, efficient, versatile, gpt4-trained | ollama.com/library | ollama pull openhermes:7b | - | - | - | - | - |
Dolphin-Mixtral 8x7B | Eric Hartford | Uncensored MoE model | 32K tokens | Free/open | 46.7B | Local | RTX 4090 (24GB) | Text | uncensored, moe, coding, community, powerful, efficient, versatile, open-source, popular, advanced | ollama.com/library | ollama pull dolphin-mixtral:8x7b | - | - | - | - | - |
Magicoder 7B | WizardLM | OSS-Instruct trained model | 16K tokens | Free/open | 7B | Local | RTX 3060+ (8GB) | Code | coding, synthetic-data, innovative, community, specialized, development, research, efficient, tool, unique | ollama.com/library | ollama pull magicoder:7b | - | - | - | - | - |
CodeGemma 7B | Fill-in-middle code completion | 8K tokens | Free/open | 7B | Local | RTX 3060+ (8GB) | Code | code-completion, fill-in-middle, google, lightweight, development, tool, efficient, specialized, accessible, fast | ollama.com/library | ollama pull codegemma:7b | - | - | - | - | - | |
Gemma 2 27B | High-performing efficient model | 8K tokens | Free/open | 27B | Local | RTX 4090 (24GB) | Text | efficient, google, versatile, powerful, open-source, research, accessible, community, balanced, production | ollama.com/library | ollama pull gemma2:27b | - | - | - | - | - | |
Mistral 7B v0.3 | Mistral AI | Foundation 7B parameter model | 32K tokens | Free/open | 7B | Local | RTX 3060+ (8GB) | Text | foundation, efficient, french, open-source, versatile, popular, accessible, community, production, established | ollama.com/library | ollama pull mistral:7b | ~60% | ~30% | 8.3 | - | - |
Llama 3.2 3B | Meta | Lightweight edge model | 128K tokens | Free/open | 3B | Local | RTX 3050+ (6GB) | Text | lightweight, edge, meta, efficient, accessible, mobile, open-source, popular, small, versatile | ollama.com/library | ollama pull llama3.2:3b | - | - | - | - | - |
Llama 3.2 1B | Meta | Ultra-light edge model | 128K tokens | Free/open | 1B | Local | 2GB RAM/VRAM | Text | ultra-light, edge, meta, mobile, tiny, efficient, accessible, lightweight, minimal, embedded | ollama.com/library | ollama pull llama3.2:1b | - | - | - | - | - |
GPT-4.1 Mini | OpenAI | Balanced GPT-4.1 at lower cost | 1M tokens | $1 input / $4 output | Not disclosed | Cloud | N/A | Text, vision | balanced, cost-effective, multimodal, openai, production, accessible, vision, general, api, popular | platform.openai.com | API key | - | - | - | - | - |
GPT-4.1 Nano | OpenAI | Speed-optimized for autocomplete | 1M tokens | $0.12/1M blended | Not disclosed | Cloud | N/A | Text | fast, autocomplete, classification, openai, specialized, efficient, lightweight, api, production, targeted | platform.openai.com | API key | 80.1% | - | - | - | - |
GPT-4o Mini | OpenAI | 60% cheaper GPT-4o variant | 128K tokens | $0.15 input / $0.60 output | Not disclosed | Cloud | N/A | Text, vision | cost-effective, multimodal, openai, accessible, vision, production, api, popular, efficient, balanced | platform.openai.com | API key | - | - | - | - | - |
o4-mini | OpenAI | Compact efficient reasoning | 200K tokens | $1.10 input / $4.40 output | Not disclosed | Cloud | N/A | Text, vision, code | reasoning, compact, efficient, openai, accessible, vision, code, math, production, balanced | ChatGPT, API | Subscription | - | - | - | - | - |
o1 | OpenAI | Earlier reasoning model | 200K tokens | Premium pricing | Not disclosed | Cloud | N/A | Text, reasoning | reasoning, legacy, openai, research, advanced, problem-solving, premium, established, powerful, specialized | ChatGPT Pro, API | Pro subscription | - | - | 1355 Elo | 73.0% | - |
Claude 3.5 Haiku | Anthropic | Fast, cost-effective model | 200K tokens | $0.80 input / $4 output | Not disclosed | Cloud | N/A | Text, vision | fast, cost-effective, anthropic, lightweight, accessible, vision, production, api, efficient, popular | claude.ai, API | API key | - | - | - | - | - |
Claude 3 Opus | Anthropic | Legacy advanced reasoning | 200K tokens | $15 input / $75 output | Not disclosed | Cloud | N/A | Text, vision | legacy, reasoning, anthropic, powerful, premium, advanced, deprecated, enterprise, high-end, established | Anthropic API | API key | - | - | - | - | - |
Grok 3 Mini | xAI | Efficient Grok 3 variant | 1M tokens | $3 input / $15 output | Not disclosed | Cloud | N/A | Text, images | efficient, reasoning, x-integration, cost-effective, accessible, think-mode, social, web-aware, balanced, popular | grok.x.ai | X Premium+ | 78.9% | - | - | 66.2% | - |
Gemini 1.5 Pro | Legacy high-performance model | 2M tokens | $1.25-2.50 input / $5-10 output | Not disclosed | Cloud | N/A | Text, images, audio, video | legacy, multimodal, long-context, google, comprehensive, deprecated, powerful, established, video, audio | ai.google.dev | Legacy access only | ~87-88% | - | - | - | - | |
Gemini 1.5 Flash | Legacy lightweight model | 1M tokens | $0.075 input / $0.30 output | Not disclosed | Cloud | N/A | Text, images, audio, video | legacy, lightweight, efficient, google, multimodal, deprecated, fast, accessible, cost-effective, versatile | ai.google.dev | Legacy access only | 68% | - | - | - | - | |
Llama 3 70B | Meta | Earlier Llama generation | 8K tokens | Free/open | 70B | Local | Dual RTX 4090 | Text | legacy, open-source, meta, powerful, research, community, established, large, accessible, popular | llama.meta.com | Download | ~80% | ~81% | - | - | - |
Llama 3 8B | Meta | Compact earlier Llama | 8K tokens | Free/open | 8B | Local | RTX 3070+ (8GB) | Text | legacy, open-source, meta, efficient, accessible, community, established, popular, balanced, versatile | llama.meta.com | Download | ~68% | ~62% | - | - | - |
Llama 3.1 8B | Meta | Updated 8B with 128K context | 128K tokens | Free/open | 8B | Local | RTX 3070+ (8GB) | Text | open-source, long-context, meta, efficient, updated, accessible, popular, community, balanced, versatile | llama.meta.com | Download | 67.6% | - | - | - | - |
Command R7B | Cohere | Smallest Cohere generative model | Not specified | $0.30 input / $1.20 output | 7B | Cloud | N/A | Text | compact, fast, cost-effective, cohere, canadian, efficient, accessible, api, production, lightweight | api.cohere.ai | API key | - | - | - | - | - |
Embed v3.0 | Cohere | High-quality English embeddings | 512 tokens | $0.10/1M tokens | Not disclosed | Cloud | N/A | Embeddings | embeddings, semantic-search, cohere, specialized, english, tool, api, production, efficient, targeted | api.cohere.ai | API key | - | - | - | - | - |
Rerank v3.0 | Cohere | Document ranking for RAG | 4096 tokens | $1.00/1K searches | Not disclosed | Cloud | N/A | Reranking | reranking, rag, search, cohere, specialized, tool, api, production, targeted, efficient | api.cohere.ai | API key | - | - | - | - | - |
Amazon Nova Micro | Amazon | Ultra-low latency text model | Not specified | $0.000035 input / $0.000140 output | Not disclosed | Cloud | N/A | Text | ultra-fast, cheap, aws, text-only, lightweight, efficient, amazon, cloud, production, minimal | Amazon Bedrock | AWS account | - | - | - | - | - |
Luminous Supreme | Aleph Alpha | European LLM with explainability | Not specified | €1.00/1K tokens | 70B | Cloud | N/A | Text | european, explainable, german, enterprise, gdpr, sovereign, specialized, commercial, regulated, unique | api.aleph-alpha.com | Subscription | - | - | - | - | - |
DALL-E 2 | OpenAI | Budget-friendly image generation | N/A | $0.016-0.02/image | Not disclosed | Cloud | N/A | Image generation | budget, image-gen, legacy, openai, accessible, api, established, cost-effective, basic, popular | platform.openai.com | API key | - | - | - | - | - |
Whisper tiny | OpenAI | Smallest speech model | 30-sec segments | $0.006/minute | 39M | Local | 1GB VRAM | Speech-to-text | tiny, efficient, speech, openai, lightweight, edge, accessible, minimal, fast, embedded | API/Download | API/Download | - | - | - | - | - |
Azure AI Speech | Microsoft | Enterprise speech services | Batch up to 1000hr | $1-15/hour | Not disclosed | Cloud | N/A | Speech-to-text, TTS | enterprise, azure, microsoft, comprehensive, cloud, production, scalable, professional, integrated, commercial | Azure portal | Azure account | - | - | - | - | - |
AssemblyAI Universal-1 | AssemblyAI | Advanced ASR with diarization | Real-time | $0.00037/second | Not disclosed | Cloud | N/A | Speech-to-text | transcription, diarization, real-time, commercial, accurate, professional, api, production, specialized, advanced | assemblyai.com | API key | - | - | - | - | - |
CodeT5+ | Salesforce | Code understanding and generation | 512-1024 tokens | Free/open | Various | Local | 8-16GB VRAM | Code | coding, understanding, generation, salesforce, open-source, research, tool, development, comprehensive, bi-directional | huggingface.co | Download | - | High | - | - | - |
DeepSeek R1 | DeepSeek | Open-source reasoning model | Extended context | $0.55 input / $2.19 output | Various | Both | Varies | Text, reasoning | reasoning, open-source, chinese, affordable, competitive, research, logic, accessible, innovative, popular | API/Download | API/Download | - | - | - | Competitive | - |
Gemini 2.0 Flash | Free experimental multimodal | 1M tokens | Free (experimental) | Not disclosed | Cloud | N/A | Text, images, audio, video | experimental, free, multimodal, google, preview, live-api, real-time, innovative, accessible, beta | ai.google.dev | Google account | - | - | - | - | - | |
Midjourney v6.1 | Midjourney | Artistic image generation | ~250 chars | $10-60/month | Not disclosed | Cloud | N/A | Image generation | artistic, aesthetic, discord, subscription, popular, creative, high-quality, community, professional, established | Discord/Web | Subscription | - | - | - | - | - |
ElevenLabs Flash v2.5 | ElevenLabs | Ultra-low latency TTS | N/A | 0.5 credits/char | Not disclosed | Cloud | N/A | Text-to-speech | ultra-fast, tts, 75ms-latency, multilingual, commercial, production, real-time, professional, efficient, innovative | elevenlabs.io | Subscription | - | - | - | - | - |
Google Imagen 4 | Advanced text-to-image | 2048 tokens | Cloud pricing | Not disclosed | Cloud | N/A | Image generation | image-gen, google, advanced, prompt-adherence, cloud, enterprise, production, high-quality, commercial, sota | Vertex AI | Google Cloud | - | - | - | - | - | |
Veo 3 | Text-to-video with audio | Extended prompts | Cloud pricing | Not disclosed | Cloud | N/A | Video generation | video-gen, audio, google, innovative, multimodal, advanced, cloud, enterprise, cutting-edge, comprehensive | Vertex AI | Google Cloud | - | - | - | - | - | |
Reka Flash | Reka AI | Efficient multimodal model | 128K tokens | $0.35/1M tokens | Not disclosed | Cloud | N/A | Text, image, audio, video | multimodal, efficient, startup, affordable, comprehensive, innovative, all-modalities, accessible, balanced, versatile | api.reka.ai | API key | - | - | - | - | - |
Jurassic-2 Ultra | AI21 Labs | Enterprise baseline model | Not specified | Enterprise pricing | Not disclosed | Cloud | N/A | Text | enterprise, israeli, established, commercial, baseline, professional, production, comprehensive, reliable, advanced | AI21 Studio | Enterprise contact | - | - | - | - | - |
Sonar Deep Research | Perplexity | Exhaustive research model | Not specified | Premium pricing | Not disclosed | Cloud | N/A | Text, search | deep-research, comprehensive, premium, search, analysis, multi-step, professional, advanced, specialized, thorough | Perplexity API | Premium tier | - | - | - | - | - |
o1-mini | OpenAI | Efficient reasoning model | 128K tokens | Premium pricing | Not disclosed | Cloud | N/A | Text, reasoning | reasoning, efficient, openai, accessible, problem-solving, math, science, balanced, popular, production | ChatGPT Pro, API | Pro subscription | - | - | Third place | - | - |
Phi-3.5 Mini 3.8B | Microsoft | Improved multilingual mini model | 128K tokens | Free/open | 3.8B | Local | RTX 3060+ (6GB) | Text | multilingual, efficient, microsoft, updated, lightweight, reasoning, accessible, improved, edge, popular | ollama.com/library | ollama pull phi3.5:3.8b | - | - | - | - | - |
Gemma 3 12B | Current most capable single-GPU model | 128K tokens | Free/open | 12B | Local | RTX 3080+ (12GB) | Text | general-purpose, efficient, google, vision-capable, powerful, accessible, balanced, production, open-source, latest | ollama.com/library | ollama pull gemma3:12b | - | - | - | - | - | |
Mistral Small 3.1 24B | Mistral | Vision + 128K context model | 128K tokens | Free/open | 24B | Local | RTX 4090 (24GB) | Text, vision | vision, multimodal, long-context, mistral, french, efficient, accessible, production, open-source, innovative | ollama.com/library | ollama pull mistral-small3.1:24b | 81% | - | - | - | - |
Command-R 35B | Cohere | Long context conversational AI | 128K tokens | Free/open | 35B | Local | RTX 4090 (24GB) | Text | conversational, long-context, enterprise, cohere, canadian, powerful, accessible, open-source, production, rag | ollama.com/library | ollama pull command-r:35b | - | - | - | - | - |
Mixtral 8x7B | Mistral | Mixture of Experts architecture | 32K tokens | Free/open | 46.7B | Local | RTX 4090 (24GB) | Text | moe, high-performance, efficient, mistral, french, innovative, powerful, open-source, popular, accessible | ollama.com/library | ollama pull mixtral:8x7b | ~70% | ~40% | 8.3 | - | - |
rStar-Math 7B | Microsoft | Self-evolved mathematical reasoning | Not specified | Research use | 7B | Local | RTX 3060+ (8GB) | Text, math | math, reasoning, specialized, microsoft, research, compact, powerful, innovative, academic, self-evolving | Research paper | Academic license | - | - | - | - | - |
NeMo ASR | NVIDIA | Enterprise speech recognition | Real-time | Enterprise pricing | Various | Both | NVIDIA GPUs | Speech-to-text | enterprise, nvidia, speech, optimized, commercial, production, scalable, professional, gpu-optimized, accurate | NVIDIA NeMo | Enterprise license | - | - | - | - | - |
The 2025 AI model landscape reveals clear market segments:
1. Enterprise Powerhouses ($3-15/1M tokens)
2. Developer Favorites ($0.50-3/1M tokens)
3. Open-Source Champions (Free)
4. Edge Specialists (<5B parameters)
Consumer GPU capabilities in 2025:
Geographic distribution shows healthy global competition:
Dramatic price reductions across categories:
Primary: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro
Budget-Conscious: Mistral Medium 3, Command A
Specialized: Amazon Nova Pro (AWS), Luminous Supreme (EU compliance)
General Purpose: GPT-4o, Llama 3.3 70B
Coding: GitHub Copilot, DeepSeek-Coder V2, CodeLlama
Multimodal: Gemini 2.5 Flash, Reka Core
Reasoning: o3, DeepSeek-R1, Claude 3.7 Sonnet
Open Models: Llama 3.1 405B, Mixtral 8x22B
Specialized: rStar-Math, Nous-Hermes 2
Cloud: ChatGPT (GPT-4o), Claude.ai, Pi
Local: TinyLlama, Phi-3 Mini, Mistral 7B
Creative: Dolphin 3.0, Hermes 3
Image Generation: FLUX.1.1 Pro, Stable Diffusion 3.5, DALL-E 3
Speech: Whisper large-v3, ElevenLabs Turbo v2.5
Video: Veo 3, Amazon Nova Pro
Search: Perplexity Sonar Pro, Grok 3
The 2025 AI model ecosystem demonstrates:
This comprehensive analysis provides decision-makers with the data needed to select optimal AI models for their specific requirements, balancing performance, cost, and deployment constraints in the rapidly evolving 2025 landscape.