Vision Models
Vision Models
Section titled “Vision Models”Not all AI models can analyze images and screenshots. This guide shows which models support vision and how to configure them.
Vision-Capable Providers
Section titled “Vision-Capable Providers”✅ Claude (Anthropic) - RECOMMENDED
Section titled “✅ Claude (Anthropic) - RECOMMENDED”All Claude 3+ models support vision by default.
Models with Vision:
- ✅
claude-sonnet-4-5(Default) - Excellent - ✅
claude-3-opus-20240229- Best quality - ✅
claude-3-sonnet-20240229- Good - ✅
claude-3-haiku-20240307- Fast, cheap
Best For:
- Code screenshots
- UI/UX analysis
- Error debugging
- General screenshot analysis
Setup: Default model works perfectly - no changes needed!
✅ OpenAI - EXCELLENT
Section titled “✅ OpenAI - EXCELLENT”Vision-specific models required.
Models with Vision:
- ✅
gpt-4-vision-preview- Recommended - ✅
gpt-4-turbo- Includes vision - ✅
gpt-4o- Optimized multimodal - ✅
gpt-4o-mini- Budget option
Best For:
- Detailed descriptions
- OCR (text extraction)
- Chart/graph analysis
Setup: Change model to gpt-4-vision-preview in Settings
✅ Google Gemini - GOOD
Section titled “✅ Google Gemini - GOOD”Requires vision-specific model.
Models with Vision:
- ✅
gemini-pro-vision- Must use this! - ✅
gemini-1.5-pro- Latest - ✅
gemini-1.5-flash- Faster
Best For:
- Charts and graphs
- Scientific diagrams
- Free tier testing
Setup: ⚠️ IMPORTANT - Change model from gemini-pro to gemini-pro-vision in Settings
✅ OpenRouter - FLEXIBLE
Section titled “✅ OpenRouter - FLEXIBLE”Many vision models available.
Models with Vision:
- ✅
anthropic/claude-sonnet-4-5(Default) - Excellent - ✅
openai/gpt-4-vision-preview - ✅
google/gemini-pro-vision
Setup: Default model (Claude) works great
❌ No Vision Support
Section titled “❌ No Vision Support”These providers cannot analyze screenshots:
- ❌ Cerebras
- ❌ Together AI
- ❌ z.ai
Quick Setup Guide
Section titled “Quick Setup Guide”For Claude (Easiest)
Section titled “For Claude (Easiest)”Provider: Claude (Anthropic)Model: claude-sonnet-4-5 (default is fine)No changes needed - works out of the box!
For OpenAI
Section titled “For OpenAI”Provider: OpenAIModel: gpt-4-vision-preview (change from default)Must change model for vision support.
For Gemini (Free Tier)
Section titled “For Gemini (Free Tier)”Provider: Google GeminiModel: gemini-pro-vision (MUST CHANGE from gemini-pro)Critical: Default model doesn’t support vision!
Use Case Recommendations
Section titled “Use Case Recommendations”Code Screenshots
Section titled “Code Screenshots”Best: Claude 3.5 Sonnet
- Understands syntax highlighting
- Explains code logic well
UI/UX Design
Section titled “UI/UX Design”Best: Claude or GPT-4 Vision
- Great design critique
- Understands layouts
Error Debugging
Section titled “Error Debugging”Best: Claude 3.5 Sonnet
- Understands stack traces
- Suggests solutions quickly
Charts & Graphs
Section titled “Charts & Graphs”Best: Gemini Pro Vision or GPT-4 Vision
- Excellent at data interpretation
- Extracts numbers accurately
Budget Option
Section titled “Budget Option”Best: Claude Haiku or GPT-4o Mini
- Much cheaper
- Still decent quality
Testing Your Setup
Section titled “Testing Your Setup”After configuring:
- Take a screenshot (
Cmd+Shift+4) - Wait 2 seconds
- Press
Cmd+Shift+A - Select “Explain”
- You should see a description of your screenshot
If It Doesn’t Work
Section titled “If It Doesn’t Work”Error: “Model doesn’t support images”
- Using wrong model (e.g.,
gemini-proinstead ofgemini-pro-vision) - Change model in Settings
Error: “Failed to process”
- API key might not have access to vision models
- Provider doesn’t support vision
- Check provider status page
Cost Comparison
Section titled “Cost Comparison”Claude Models
Section titled “Claude Models”- Sonnet 4.5: $3-15 per million tokens
- Opus: $15-75 per million tokens
- Haiku: $0.25-1.25 per million tokens
OpenAI Models
Section titled “OpenAI Models”- GPT-4 Vision: $10-30 per million tokens
- GPT-4o: $5-15 per million tokens
- GPT-4o Mini: $0.15-0.60 per million tokens
Gemini Models
Section titled “Gemini Models”- Free tier: 60 requests/minute
- Paid: ~$0.50 per million tokens
Best Overall Setup
Section titled “Best Overall Setup”For most users:
Provider: Claude (Anthropic)Model: claude-sonnet-4-5Why: Best balance of quality, speed, and costNo configuration needed!Next Steps
Section titled “Next Steps”- Screenshots Guide: Learn to use screenshot features
- Settings: Configure your chosen provider
- AI Providers: Compare all providers