Skip to content

Vision Models

Not all AI models can analyze images and screenshots. This guide shows which models support vision and how to configure them.

All Claude 3+ models support vision by default.

Models with Vision:

  • claude-sonnet-4-5 (Default) - Excellent
  • claude-3-opus-20240229 - Best quality
  • claude-3-sonnet-20240229 - Good
  • claude-3-haiku-20240307 - Fast, cheap

Best For:

  • Code screenshots
  • UI/UX analysis
  • Error debugging
  • General screenshot analysis

Setup: Default model works perfectly - no changes needed!

Vision-specific models required.

Models with Vision:

  • gpt-4-vision-preview - Recommended
  • gpt-4-turbo - Includes vision
  • gpt-4o - Optimized multimodal
  • gpt-4o-mini - Budget option

Best For:

  • Detailed descriptions
  • OCR (text extraction)
  • Chart/graph analysis

Setup: Change model to gpt-4-vision-preview in Settings

Requires vision-specific model.

Models with Vision:

  • gemini-pro-vision - Must use this!
  • gemini-1.5-pro - Latest
  • gemini-1.5-flash - Faster

Best For:

  • Charts and graphs
  • Scientific diagrams
  • Free tier testing

Setup: ⚠️ IMPORTANT - Change model from gemini-pro to gemini-pro-vision in Settings

Many vision models available.

Models with Vision:

  • anthropic/claude-sonnet-4-5 (Default) - Excellent
  • openai/gpt-4-vision-preview
  • google/gemini-pro-vision

Setup: Default model (Claude) works great

These providers cannot analyze screenshots:

  • ❌ Cerebras
  • ❌ Together AI
  • ❌ z.ai
Provider: Claude (Anthropic)
Model: claude-sonnet-4-5 (default is fine)

No changes needed - works out of the box!

Provider: OpenAI
Model: gpt-4-vision-preview (change from default)

Must change model for vision support.

Provider: Google Gemini
Model: gemini-pro-vision (MUST CHANGE from gemini-pro)

Critical: Default model doesn’t support vision!

Best: Claude 3.5 Sonnet

  • Understands syntax highlighting
  • Explains code logic well

Best: Claude or GPT-4 Vision

  • Great design critique
  • Understands layouts

Best: Claude 3.5 Sonnet

  • Understands stack traces
  • Suggests solutions quickly

Best: Gemini Pro Vision or GPT-4 Vision

  • Excellent at data interpretation
  • Extracts numbers accurately

Best: Claude Haiku or GPT-4o Mini

  • Much cheaper
  • Still decent quality

After configuring:

  1. Take a screenshot (Cmd+Shift+4)
  2. Wait 2 seconds
  3. Press Cmd+Shift+A
  4. Select “Explain”
  5. You should see a description of your screenshot

Error: “Model doesn’t support images”

  • Using wrong model (e.g., gemini-pro instead of gemini-pro-vision)
  • Change model in Settings

Error: “Failed to process”

  • API key might not have access to vision models
  • Provider doesn’t support vision
  • Check provider status page
  • Sonnet 4.5: $3-15 per million tokens
  • Opus: $15-75 per million tokens
  • Haiku: $0.25-1.25 per million tokens
  • GPT-4 Vision: $10-30 per million tokens
  • GPT-4o: $5-15 per million tokens
  • GPT-4o Mini: $0.15-0.60 per million tokens
  • Free tier: 60 requests/minute
  • Paid: ~$0.50 per million tokens

For most users:

Provider: Claude (Anthropic)
Model: claude-sonnet-4-5
Why: Best balance of quality, speed, and cost
No configuration needed!