May 26, 2026 · 7 min read

Last updated on May 28, 2026

DeepSeek V4 Pro Makes 75% Discount Permanent: Now $0.87/1M Output Tokens

DeepSeek just made the 75% discount on V4 Pro permanent. Announced May 23, 2026, the promotional pricing that was set to expire May 31 is now the standard rate. V4 Pro output tokens cost $0.87 per million. Input tokens cost $0.435 per million. These are not introductory rates. This is the price.

For context: GPT-5.5 charges $30 per million output tokens. DeepSeek V4 Pro now costs 34.5x less for output. On a model that scores within striking distance on every major benchmark.

If you are building anything that touches an LLM API, this changes your cost math permanently. Here is what happened, what it means, and how it compares to everything else on the market.

What changed

DeepSeek launched V4 Pro on April 24, 2026 with prices of $1.74/1M input tokens and $3.48/1M output tokens. Within days, they announced a 75% promotional discount, dropping prices to $0.435 input and $0.87 output. The promo was originally set to end May 31.

On May 23, Reuters and Engadget reported that DeepSeek made the discount permanent. No expiration date. No catch. The promotional price is now the standard price.

Here is the before and after:

	Original price	New permanent price	Discount
Input tokens (per 1M)	$1.74	$0.435	75% off
Output tokens (per 1M)	$3.48	$0.87	75% off
Cache hit input (per 1M)	$0.145	$0.036	75% off

V4 Flash also got permanent discounts. Its new rates are $0.07/1M input and $0.28/1M output, making it one of the cheapest frontier-capable models available anywhere. See our V4 Flash analysis for the full breakdown on that tier.

Pricing comparison: DeepSeek V4 Pro vs the field

Here is how V4 Pro’s permanent pricing stacks up against every major API provider as of May 2026:

Model	Input (per 1M)	Output (per 1M)	Output cost vs V4 Pro
DeepSeek V4 Pro	$0.435	$0.87	1x (baseline)
DeepSeek V4 Flash	$0.07	$0.28	0.32x
GPT-5.5	$15.00	$30.00	34.5x more
GPT-5.4	$5.00	$15.00	17.2x more
Claude Opus 4.7	$15.00	$75.00	86.2x more
Claude Sonnet 4.6	$3.00	$15.00	17.2x more
Gemini 3.5 Flash	$0.15	$0.60	0.69x
Gemini 3.1 Pro	$3.50	$10.50	12.1x more
Qwen 3.7 Max	$1.60	$6.40	7.4x more

The gap is staggering. Claude Opus 4.7 costs 86x more per output token. GPT-5.5 costs 34.5x more. Even mid-tier models like Claude Sonnet and GPT-5.4 are 17x more expensive on output.

Only Gemini 3.5 Flash and DeepSeek’s own V4 Flash are cheaper, and neither matches V4 Pro’s benchmark scores on complex reasoning and coding tasks.

What this means for developers

The permanent pricing removes the last hesitation developers had about building on V4 Pro. When it was a promo, you had to plan for a 4x price increase. Now you can budget with confidence.

Cost scenarios

Here is what real workloads cost at the new permanent rates:

Coding agent session (30 min, heavy use):

~500K input tokens, ~100K output tokens
Cost: $0.22 + $0.09 = $0.31 per session
With 90% cache hits: $0.05 + $0.09 = $0.14 per session

RAG pipeline (1000 queries/day):

~2M input tokens/day, ~500K output tokens/day
Cost: $0.87 + $0.44 = $1.31/day ($39/month)
Same workload on GPT-5.5: $1,350/month

Chat application (10K messages/day):

~5M input tokens, ~2M output tokens
Cost: $2.18 + $1.74 = $3.92/day ($118/month)
Same workload on Claude Sonnet 4.6: $2,010/month

We documented real-world costs from autonomous coding sessions in our 13 cents per session analysis. Those numbers now hold permanently.

The cache hit multiplier

The pricing above gets even better with prefix caching. DeepSeek charges cache hits at roughly 1/12th of the standard input rate. If your application has stable system prompts or repeated context (which most do), your effective input cost drops dramatically.

Tools like Reasonix are already exploiting this, achieving 99%+ cache hit rates and cutting effective costs by another 5x on top of the permanent discount. For API setup details, see our DeepSeek V4 API guide.

The Huawei Ascend 950 angle

DeepSeek is running V4 Pro on Huawei Ascend 950 chips, not NVIDIA hardware. This matters for two reasons:

Supply independence. DeepSeek is not constrained by US export controls on NVIDIA GPUs. They can scale capacity without competing for H100/B200 allocation.
Cost structure. Huawei chips are cheaper to procure in China than NVIDIA equivalents. This lower hardware cost is part of what makes the aggressive pricing sustainable long-term.

The permanent discount signals that DeepSeek’s unit economics work at these prices. They are not burning cash to acquire users. The Ascend 950 infrastructure gives them a structural cost advantage that Western providers cannot easily match while dependent on NVIDIA silicon.

Impact on the AI pricing war

This move puts pressure on every other provider. The pricing gap between DeepSeek and Western frontier models was already large. Making it permanent forces competitors to respond or accept that cost-sensitive workloads will migrate.

What we expect to see:

OpenAI and Anthropic will likely introduce more aggressive caching discounts and batch pricing tiers rather than cutting headline rates.
Google is already competitive on Flash pricing but may need to cut Pro rates.
Open-source deployments become harder to justify economically. Self-hosting V4 Pro costs more than using the API unless you have specific privacy or latency requirements.

For a deeper look at V4 Pro’s capabilities that justify building on it, see our DeepSeek V4 Pro complete guide.

Who should switch now

Switch if:

You are paying for GPT-5.5 or Claude Opus and your workload does not require their specific strengths (multimodal, specific safety behaviors)
You are using a mid-tier model (Sonnet, GPT-5.4) and want equivalent quality at 17x lower cost
You are building cost-sensitive applications where API spend is a significant line item
You want to lock in pricing without worrying about promo expirations

Stay where you are if:

You need multimodal capabilities (V4 Pro is text-only)
Your compliance requirements mandate US-based providers
You are deeply integrated with OpenAI/Anthropic tooling and the migration cost exceeds savings
You need guaranteed SLAs that DeepSeek does not yet offer

The bottom line

DeepSeek V4 Pro at $0.87/1M output tokens is not a promotional stunt. It is the new baseline for what frontier AI should cost. A model that matches GPT-5.5 on coding benchmarks and beats it on math now costs 34.5x less per output token, permanently.

The AI pricing war just got a new floor. Every developer building on LLM APIs needs to recalculate their cost projections. The numbers have changed, and they are not going back up.

FAQ

Is the 75% discount really permanent or could DeepSeek raise prices later?

DeepSeek announced on May 23, 2026 that the discount is permanent with no expiration date. They could theoretically raise prices in the future (any company can), but the announcement explicitly removes the May 31 deadline and establishes these as standard rates. The Huawei Ascend 950 infrastructure gives them sustainable unit economics at this price point.

How does V4 Pro quality compare to GPT-5.5 at 34.5x lower cost?

V4 Pro scores 80.6% on SWE-bench Verified vs GPT-5.5’s 82.1%. On LiveCodeBench it scores 93.5% vs 91.8%. On math (AIME 2025), V4 Pro scores 92.7% vs GPT-5.5’s 89.4%. Quality is comparable or better on coding and math. GPT-5.5 has advantages in multimodal tasks and certain creative writing scenarios.

Does the permanent discount apply to V4 Flash too?

Yes. V4 Flash pricing is also permanently discounted. New rates are $0.07/1M input tokens and $0.28/1M output tokens. See our V4 Flash guide for details on when to use Flash vs Pro.

Can I still get cache hit discounts on top of the permanent price cut?

Yes. Cache hits are priced at approximately $0.036/1M tokens, which is roughly 1/12th of the standard input rate. The cache discount stacks with the permanent 75% reduction. This is how tools achieve effective costs under $0.15 per coding session.

What are the rate limits at the discounted price?

Rate limits remain unchanged from the promotional period. Standard tier gets 60 requests per minute and 1M tokens per minute throughput. Higher tiers are available. Check the DeepSeek V4 API guide for current limits and how to request increases.

Should I migrate from OpenAI/Anthropic to DeepSeek?

If your workload is primarily text-based (coding, analysis, chat, RAG) and you do not have strict compliance requirements mandating US providers, the cost savings are substantial enough to justify migration for most use cases. Start by running your eval suite against V4 Pro to verify quality meets your bar, then migrate incrementally.

Why can DeepSeek offer prices this low?

Three factors: (1) Huawei Ascend 950 chips are cheaper to procure in China than NVIDIA GPUs, (2) the MoE architecture activates only 49B of 1.6T parameters per token, keeping compute costs low, and (3) DeepSeek’s aggressive prefix caching infrastructure reduces redundant computation. The permanent discount signals these economics are sustainable, not subsidized.

How does this compare to Xiaomi’s MiMo V2.5 Pro price cut?

On May 26, Xiaomi cut MiMo V2.5 Pro to the exact same price point: $0.435 input, $0.87 output. They also slashed cached input to $0.0036/M (99% reduction). Both models now cost the same — the choice comes down to workload. See our MiMo V2.5 Pro vs DeepSeek V4-Pro comparison for a detailed breakdown.