Vision Models

Not all AI models can analyze images and screenshots. This guide shows which models support vision and how to configure them.

Vision-Capable Providers

✅ Claude (Anthropic) - RECOMMENDED

All Claude 3+ models support vision by default.

Models with Vision:

✅ claude-sonnet-4-5 (Default) - Excellent
✅ claude-3-opus-20240229 - Best quality
✅ claude-3-sonnet-20240229 - Good
✅ claude-3-haiku-20240307 - Fast, cheap

Best For:

Code screenshots
UI/UX analysis
Error debugging
General screenshot analysis

Setup: Default model works perfectly - no changes needed!

✅ OpenAI - EXCELLENT

Vision-specific models required.

Models with Vision:

✅ gpt-4-vision-preview - Recommended
✅ gpt-4-turbo - Includes vision
✅ gpt-4o - Optimized multimodal
✅ gpt-4o-mini - Budget option

Best For:

Detailed descriptions
OCR (text extraction)
Chart/graph analysis

Setup: Change model to gpt-4-vision-preview in Settings

✅ Google Gemini - GOOD

Requires vision-specific model.

Models with Vision:

✅ gemini-pro-vision - Must use this!
✅ gemini-1.5-pro - Latest
✅ gemini-1.5-flash - Faster

Best For:

Charts and graphs
Scientific diagrams
Free tier testing

Setup: ⚠️ IMPORTANT - Change model from gemini-pro to gemini-pro-vision in Settings

✅ OpenRouter - FLEXIBLE

Many vision models available.

Models with Vision:

✅ anthropic/claude-sonnet-4-5 (Default) - Excellent
✅ openai/gpt-4-vision-preview
✅ google/gemini-pro-vision

Setup: Default model (Claude) works great

❌ No Vision Support

These providers cannot analyze screenshots:

❌ Cerebras
❌ Together AI
❌ z.ai

Quick Setup Guide

For Claude (Easiest)

Provider: Claude (Anthropic)
Model: claude-sonnet-4-5 (default is fine)

No changes needed - works out of the box!

For OpenAI

Provider: OpenAI
Model: gpt-4-vision-preview (change from default)

Must change model for vision support.

For Gemini (Free Tier)

Provider: Google Gemini
Model: gemini-pro-vision (MUST CHANGE from gemini-pro)

Critical: Default model doesn’t support vision!

Use Case Recommendations

Code Screenshots

Best: Claude 3.5 Sonnet

Understands syntax highlighting
Explains code logic well

UI/UX Design

Best: Claude or GPT-4 Vision

Great design critique
Understands layouts

Error Debugging

Best: Claude 3.5 Sonnet

Understands stack traces
Suggests solutions quickly

Charts & Graphs

Best: Gemini Pro Vision or GPT-4 Vision

Excellent at data interpretation
Extracts numbers accurately

Budget Option

Best: Claude Haiku or GPT-4o Mini

Much cheaper
Still decent quality

Testing Your Setup

After configuring:

Take a screenshot (Cmd+Shift+4)
Wait 2 seconds
Press Cmd+Shift+A
Select “Explain”
You should see a description of your screenshot

If It Doesn’t Work

Error: “Model doesn’t support images”

Using wrong model (e.g., gemini-pro instead of gemini-pro-vision)
Change model in Settings

Error: “Failed to process”

API key might not have access to vision models
Provider doesn’t support vision
Check provider status page

Cost Comparison

Claude Models

Sonnet 4.5: $3-15 per million tokens
Opus: $15-75 per million tokens
Haiku: $0.25-1.25 per million tokens

OpenAI Models

GPT-4 Vision: $10-30 per million tokens
GPT-4o: $5-15 per million tokens
GPT-4o Mini: $0.15-0.60 per million tokens

Gemini Models

Free tier: 60 requests/minute
Paid: ~$0.50 per million tokens

Best Overall Setup

For most users:

Provider: Claude (Anthropic)
Model: claude-sonnet-4-5
Why: Best balance of quality, speed, and cost
No configuration needed!

Next Steps

Screenshots Guide: Learn to use screenshot features
Settings: Configure your chosen provider
AI Providers: Compare all providers