Wring
All articlesAWS Guides

AWS Bedrock Models 2026: Every LLM Available and What It Costs

Complete guide to all LLM models on Amazon Bedrock in 2026. Compare Claude, Llama, Nova, DeepSeek, Mistral, and Gemma pricing, capabilities, and use cases — from $0.035 to $30 per 1M tokens.

Wring Team
March 8, 2026
18 min read
AWS BedrockLLMAI modelsClaudeLlamaAmazon Novagenerative AImodel pricing
Futuristic AI technology concept with glowing neural network connections

Amazon Bedrock gives you access to over 40 foundation models from 10+ providers through a single API — no infrastructure to manage, no GPUs to provision, no model weights to download. You pay per token, and AWS handles everything else.

The problem? Choosing the right model. The pricing spread is absurd: Amazon Nova Micro costs $0.035 per million input tokens. Claude 3.5 Sonnet costs $6.00 for the same million tokens — that's a 171x difference. Pick wrong and you'll either overspend on capabilities you don't need, or under-deliver with a model that can't handle the job.

72% of organizations now use generative AI cloud services, up from 47% in 2024 (Flexera, 2025). AWS generated $128.7 billion in revenue in 2025, with its fastest growth in 13 quarters (Amazon IR, 2026). Bedrock is a major driver of that growth.

TL;DR: Bedrock offers 40+ models from Amazon, Anthropic, Meta, DeepSeek, Google, Mistral, Cohere, AI21 Labs, MiniMax, and Stability AI. For most use cases, start with Amazon Nova Micro ($0.035/$0.14 per 1M tokens) for simple tasks and Nova Pro ($0.80/$3.20) for complex reasoning. Use Claude 3.5 Sonnet ($6/$30) only when you need top-tier intelligence. Batch inference saves 50% across all models.


How Bedrock Pricing Works

Before diving into individual models, you need to understand how Bedrock charges you. There are four pricing tiers:

Pricing ModeHow It WorksBest ForSavings
On-DemandPay per token, no commitmentVariable workloads, prototypingBaseline
BatchSubmit jobs, get results later (up to 24hr)Large-scale processing, data pipelines50% off
Provisioned ThroughputReserve model capacity per hourPredictable, high-volume productionUp to 30-40%
Cross-Region InferenceRoute to regions with available capacityAvoiding throttling, higher throughputVaries

The golden rule: If your workload can tolerate latency, always use batch inference. A flat 50% discount on every model is hard to beat.

Two key terms:

  • Input tokens = what you send to the model (your prompt, context, documents)
  • Output tokens = what the model generates (its response). Always more expensive than input tokens.

Amazon Nova Models — The Budget Kings

Amazon's own model family launched at re:Invent 2024 and has become the default choice for cost-conscious teams. These models are built to be cheap and fast.

Amazon Nova: Output Cost per 1M TokensSource: AWS Bedrock Pricing, 2025-2026Nova Micro$0.14 — text only, fastestNova Lite$0.24 — multimodal, budget pickNova Pro$3.20 — best balance for agentsNova Premier$15.00 — top-tier reasoningNova Micro is 107x cheaper than Nova Premier per output token
ModelInput / 1M TokensOutput / 1M TokensContext WindowModalityBest For
Nova Micro$0.035$0.14128KText onlyQuick Q&A, classification, routing
Nova Lite$0.06$0.24300KText, image, videoDocument analysis, image understanding
Nova Pro$0.80$3.20300KText, image, videoComplex reasoning, agents, RAG
Nova Premier$2.00$15.001M+Text, image, videoHardest tasks, distillation teacher

Our take: Nova Micro at $0.035/M input is the cheapest model on Bedrock, period. We've seen customers cut their Bedrock bill by 60-70% just by routing simple classification and extraction tasks to Nova Micro instead of Claude.

When to Use Nova

  • Nova Micro: Chatbots, FAQ bots, intent classification, data extraction from structured text. If the task doesn't require "thinking," Nova Micro handles it.
  • Nova Lite: When you need to process images or videos alongside text. Product image analysis, document OCR, video summarization. The 300K context window is huge for document-heavy workflows.
  • Nova Pro: The sweet spot for AI agents, complex RAG pipelines, and multi-step reasoning. Roughly 10x cheaper than Claude Sonnet for comparable agentic tasks.
  • Nova Premier: Only use this when you're fine-tuning other models using Nova Premier as the teacher, or for the absolute hardest reasoning tasks.

Anthropic Claude Models — The Intelligence Leaders

Anthropic's Claude models consistently rank at the top of reasoning benchmarks. They're the most expensive text models on Bedrock, but for tasks that demand deep understanding, nuanced analysis, or complex code generation, nothing else comes close.

ModelInput / 1M TokensOutput / 1M TokensContext WindowStrengths
Claude 3 Haiku$0.25$1.25200KFast, cheap, good at structured output
Claude 3.5 Haiku$0.80$4.00200KUpgraded Haiku with better reasoning
Claude 3.5 Sonnet v2$3.00$15.00200KBest price/performance for complex tasks
Claude 3.5 Sonnet$6.00$30.00200KExtended thinking, deepest reasoning
Claude Opus 4$15.00$75.00200KMost capable, highest cost

Claude's pricing includes a prompt caching feature that can dramatically reduce costs for repetitive workloads:

  • Cache write: $7.50/M tokens (one-time cost to cache a prompt)
  • Cache read: $0.60/M tokens (90% cheaper than re-sending the prompt)

If you're sending the same system prompt or document context across hundreds of requests, prompt caching can cut your Claude costs by 80-90% on the cached portion.

When to Use Claude

  • Claude 3 Haiku: Simple structured extraction, JSON generation, summarization where format matters more than depth
  • Claude 3.5 Sonnet v2: The workhorse — code generation, analysis, complex writing, tool use. Best model for the price when reasoning quality matters
  • Claude Opus 4: Only for tasks where Sonnet genuinely fails — extremely complex multi-step reasoning, nuanced legal/medical text analysis, or when accuracy justifies the 5x cost premium

Meta Llama Models — Open-Source on Bedrock

Meta's Llama family is the most popular open-source model ecosystem. Running them through Bedrock means you get the open-source flexibility without managing GPU infrastructure.

ModelInput / 1M TokensOutput / 1M TokensContext WindowParameters
Llama 3.1 8B Instruct$0.22$0.22128K8B
Llama 3.1 70B Instruct$0.72$0.72128K70B
Llama 3.1 405B Instruct$2.13$2.13128K405B
Llama 3.2 1B/3B$0.10$0.10128K1-3B
Llama 3.3 70B$0.72$0.72128K70B
Llama 4 Scout$0.17$0.1710M17B active (109B total)
Llama 4 Maverick$0.20$0.801M17B active (400B total)

Notice something unusual? Most Llama models charge the same price for input and output tokens. That's different from every other provider on Bedrock and makes cost estimation much simpler.

Llama 4: The New Generation

Llama 4 Scout and Maverick are Meta's latest release and deserve special attention:

  • Llama 4 Scout: A Mixture-of-Experts (MoE) model with 17B active parameters out of 109B total. The 10 million token context window is the largest on Bedrock — 50x larger than Claude. At $0.17/M tokens, it's cheaper than most alternatives.
  • Llama 4 Maverick: Larger MoE model (400B total) with stronger reasoning. The 1M context window and $0.80 output pricing put it in direct competition with Nova Pro.

When to Use Llama

  • Llama 3.2 1B/3B: Lightest possible model for simple tasks. At $0.10/M tokens, only Nova Micro is cheaper.
  • Llama 3.3 70B: Solid all-rounder at $0.72/M tokens. Good alternative to Nova Pro if you prefer open-source.
  • Llama 4 Scout: When you need massive context windows (10M tokens = entire codebases, book-length documents).
  • Llama 4 Maverick: When you want strong reasoning from an open-source model without paying Claude prices.

DeepSeek, Gemma, Mistral, and More

Bedrock also hosts models from several other providers. Here are the most notable:

DeepSeek

ModelInput / 1M TokensOutput / 1M TokensNotes
DeepSeek V3.2$0.62$1.85Strong reasoning, competitive with Claude at 1/5 the price

DeepSeek V3.2 is a sleeper pick. It punches well above its price class on coding and reasoning benchmarks. If you're building code generation pipelines, test DeepSeek before defaulting to Claude.

Google Gemma

ModelInput / 1M TokensOutput / 1M TokensParameters
Gemma 3 4B$0.04$0.084B
Gemma 3 12B$0.09$0.2912B
Gemma 3 27B$0.23$0.3827B

Google's Gemma models are extremely cheap and surprisingly capable for their size. Gemma 3 4B at $0.04/$0.08 is the second cheapest model on Bedrock after Nova Micro.

Mistral AI

ModelInput / 1M TokensOutput / 1M TokensNotes
Mistral 7B Instruct$0.15$0.20Budget European model
Mixtral 8x7B$0.45$0.70MoE architecture
Mistral Large$2.00$6.00Flagship, multilingual

Mistral is popular with European companies due to EU data sovereignty considerations. Mistral Large handles multilingual tasks (especially French, German, Spanish) better than most US-trained models.

MiniMax

ModelInput / 1M TokensOutput / 1M Tokens
MiniMax M2$0.30$1.20
MiniMax M2.1$0.30$1.20

Cohere

ModelPricingNotes
Rerank 3.5$2.00 per 1,000 queriesSpecialized for search ranking, not general text
Command RUsage-basedDocument grounding and RAG
Command R+Usage-basedFlagship, best for enterprise RAG

AI21 Labs

ModelNotes
Jamba 1.5 MiniSSM-Transformer hybrid, 256K context
Jamba 1.5 LargeLonger documents, structured output

Stability AI

ModelNotes
Stable Diffusion XL 1.0Image generation, $0.04 per image
Stable Image CoreHigher quality images, $0.04 per image
Stable Image UltraHighest quality, $0.08 per image

The Complete Pricing Comparison

Here's the full landscape sorted by output cost — the number that usually matters most for your bill:

Bedrock Models: Output Cost per 1M Tokens (Sorted)Source: AWS Bedrock Pricing, March 2026. Lower = cheaper.Gemma 3 4B$0.08Llama 3.2 1B$0.10Nova Micro$0.14Llama 4 Scout$0.17Mistral 7B$0.20Llama 3.3 70B$0.72Llama 4 Maverick$0.80MiniMax M2$1.20Claude 3 Haiku$1.25DeepSeek V3.2$1.85Nova Pro$3.20Claude 3.5 Haiku$4.00Mistral Large$6.00Nova Premier$15.00Claude 3.5 Sonnet$30.00Claude Opus 4$75.00The cheapest model (Gemma 3 4B) is 937x cheaper than the most expensive (Claude Opus 4)All prices are On-Demand. Batch inference offers 50% discount on all models.

How to Pick the Right Model

Don't overthink this. Match the task complexity to the model tier:

Task ComplexityExample TasksRecommended ModelOutput Cost/1M
TrivialIntent classification, spam detection, simple extractionGemma 3 4B or Nova Micro$0.08-0.14
SimpleSummarization, FAQ answering, formattingLlama 4 Scout or Nova Lite$0.17-0.24
ModerateRAG responses, email drafting, data analysisLlama 3.3 70B or DeepSeek V3.2$0.72-1.85
ComplexAI agents, multi-step reasoning, code generationNova Pro or Claude 3.5 Haiku$3.20-4.00
ExpertLegal analysis, medical reasoning, research synthesisClaude 3.5 Sonnet v2$15.00
FrontierThe hardest 1% of tasks onlyClaude Opus 4$75.00

The 80/20 rule for model selection: In our experience, 80% of production workloads can run on models costing less than $1/M output tokens. The remaining 20% need Claude-tier intelligence. Smart teams use a router — send easy requests to Nova Micro, route complex ones to Claude. This typically cuts Bedrock spend by 50-70% compared to running everything through one model.


Bedrock Features Beyond Model Access

Bedrock isn't just an API gateway for LLMs. Several built-in features affect your total cost:

Knowledge Bases (RAG)

Bedrock Knowledge Bases let you connect your documents (S3, Confluence, SharePoint) and automatically retrieve relevant context for each query. This is managed RAG — no vector database to maintain.

Pricing: Storage in your chosen vector store (OpenSearch Serverless, Aurora, Pinecone) + model inference costs for embedding and generation.

Agents

Bedrock Agents let you build autonomous AI workflows that call APIs, query databases, and chain multiple steps. Think of them as serverless AI agents — you define the tools, Bedrock orchestrates the execution.

Pricing: Model inference costs for each step, plus any API/Lambda costs your agent invokes.

Guardrails

Content filtering, PII redaction, and topic blocking built into the API layer. Essential for production deployments.

Pricing: Small per-request fee on top of model costs.

Model Evaluation

Compare models side-by-side on your own test data before committing. This prevents the most expensive mistake: picking a model that's too powerful (and too expensive) for your actual workload.


Cost Optimization Strategies for Bedrock

84% of organizations say managing cloud spend is their top challenge (Flexera, 2025). Bedrock is no exception. Here's how to keep costs under control:

1. Use Batch Inference Wherever Possible

If you don't need real-time responses, batch inference saves 50% on every model. Ideal for: nightly report generation, document processing pipelines, bulk classification, email triage systems.

2. Implement Model Routing

Don't use one model for everything. Route requests based on complexity:

  • Simple extraction → Nova Micro ($0.14/M output)
  • General conversation → Nova Lite ($0.24/M output)
  • Complex reasoning → Claude 3.5 Sonnet v2 ($15/M output)

A basic routing layer (even a simple keyword classifier) can cut your Bedrock bill by 50-70%.

3. Use Prompt Caching with Claude

If you're using Claude models with a long system prompt or shared context, enable prompt caching. Cache write costs $7.50/M tokens once, but subsequent reads cost only $0.60/M tokens — 90% cheaper than resending the full prompt every time.

4. Monitor with CloudWatch

Set up CloudWatch alarms for Bedrock API costs. A runaway agent loop can burn through hundreds of dollars in minutes. Set a daily spend threshold and get alerted before it becomes a problem.

5. Compress Your Prompts

Most prompts contain unnecessary context. Trimming prompt length by 30% directly reduces input token costs by 30%. Review your system prompts monthly and remove anything the model doesn't actually need.

What we see at Wring: Customers who implement model routing + batch inference typically reduce their Bedrock spend by 60-75% within the first month. The biggest wins come from moving classification tasks off Claude onto Nova Micro — same accuracy, 100x cheaper.


Frequently Asked Questions

Is there a free tier for Amazon Bedrock?

Yes. New AWS accounts get a limited free tier for select Bedrock models. The exact allocation varies by model and changes periodically — check the Bedrock pricing page for current free tier details. Beyond that, you pay per token with no minimum commitment.

Can I fine-tune models on Bedrock?

Yes, Bedrock supports custom model training (fine-tuning) for select models including Amazon Titan and certain Llama variants. You pay for training time (per model unit per hour) plus hosting the custom model. Fine-tuning is worth exploring if you have domain-specific data that improves output quality enough to justify the upfront training cost.

How does Bedrock pricing compare to OpenAI's API?

Bedrock's pricing is generally competitive. Claude 3.5 Sonnet v2 on Bedrock ($3/$15 batch) competes directly with GPT-4o ($2.50/$10). The advantage of Bedrock is model choice — you're not locked into one provider. If OpenAI raises prices, you switch to Nova or Llama. If Anthropic releases a better model, you switch to Claude. Multi-model access is built in.

What's the cheapest way to run an AI chatbot on Bedrock?

Amazon Nova Micro at $0.035 input / $0.14 output per 1M tokens. For a chatbot handling 1,000 conversations per day with ~500 tokens per response, that's roughly $2.10/month in Bedrock costs. Add a Lightsail or Lambda backend and you're running a production chatbot for under $25/month total.

Can I use multiple models in the same application?

Absolutely. Bedrock's API is model-agnostic — switch models by changing the model ID in your API call. Many production applications use 2-3 models: a cheap model for simple tasks, a mid-tier model for general work, and a premium model for complex reasoning. Bedrock's Intelligent Prompt Routing feature can even automate this selection.


Start Building with the Right Model

Bedrock's model catalog is massive and growing. Don't get paralyzed by choice. Start here:

  1. Prototype with Nova Micro — it's the cheapest and fastest. If it can't handle your task, you'll know immediately
  2. Upgrade to Nova Pro or Llama 3.3 70B for tasks that need more reasoning
  3. Reserve Claude for the complex 20% of tasks that actually need it
  4. Enable batch inference for any non-real-time workload — instant 50% savings
  5. Use Wring to optimize the rest of your AWS bill so you have more budget for Bedrock experimentation

The gap between the cheapest and most expensive model is 937x. That's not a rounding error — it's the difference between a $50/month AI bill and a $46,000/month one. Choose wisely.