MIT TECHNOLOGY REVIEW·
DeepSeek V4: China’s New AI Powerhouse Explained
DeepSeek’s new open-source V4 model challenges U.S. AI dominance by offering high performance at lower costs, fueling China’s drive for tech self-reliance.
From DailyListen, I'm Alex
HOST
From DailyListen, I'm Alex. DeepSeek just dropped V4, their latest AI model, and it's got the tech world buzzing. A Chinese startup claims this 1-trillion-parameter beast matches top U.S. models on coding benchmarks while running way cheaper. Stocks dipped, VCs are calling it a wake-up call like Sputnik in 1957. Why does this matter now, especially with U.S.-China tensions? We're joined by Priya, our technology analyst, who tracks how these models shift real-world power in AI.
PRIYA
What this unlocks is handling million-token prompts that dwarf what most models manage. DeepSeek V4 packs a 1M-token context window—think entire codebases or hour-long video transcripts in one go. Paired with Engram conditional memory, it remembers key patterns across those lengths without bloating compute. Internal benchmarks from Reuters show it beating Claude and GPT series on extremely long code prompts. Pre-release claims hit 80-85% on SWE-bench, up from V3's levels. For developers, that means debugging massive repos without chopping them up. But here's the catch: these are company-reported numbers. Independent tests on Arena.ai put V4 Pro third among open-sources, 14th overall in code arena—strong, but not unchallenged.
HOST
That million-token context sounds huge. A million words is like five thick novels. How does Engram make that practical without exploding costs?
PRIYA
Engram offloads static retrieval to cheaper DRAM, so it doesn't hammer the GPU every time. V4's MoE architecture activates just 32B parameters per token out of 1T total—most of the model sleeps. Add Huawei Ascend chips, which cost less per inference hour than Nvidia A100 or H100 clusters. Result? V4 runs inference dirt cheap compared to U.S. rivals. A V4 Lite at 200B parameters keeps the 1M context but scales down for broader use. Sitepoint notes a 10-fold leap over V3.2 on some benchmarks—V3.2 scored just 5 points, no typo. Developers get Claude-level coding help without the premium price tag.
HOST
Huawei chips instead of Nvidia—that's bold, given U.S. sanctions. Does this prove China can ditch Nvidia dependence?
PRIYA
V4's the first DeepSeek model tuned for Huawei Ascend, testing if homegrown silicon closes the gap. DeepSeek skipped giving Nvidia or AMD early access, per The Information—unusual, since chipmakers usually optimize ahead. But Ascend's lower inference costs give V4 a pricing edge. Morphllm.com details three architecture tweaks over V3: bigger MoE sparsity, Engram memory, native multimodal for text, images, videos. It topped Arena.ai's Vibe Code Benchmark as the number one open-source weighted model, beating Kimi K2.6 and even Gemini 3.1 Pro. Still, no word on training compute or dataset details—those gaps leave questions on how they hit these scores without Nvidia-scale resources.
Open-source weights make it accessible, but Taiwan...
HOST
Open-source weights make it accessible, but Taiwan banned it in government ops over security risks, and the U.S. Navy warned against use. Does V4's power amplify those concerns?
PRIYA
Security flags are real. Taiwan cited data exposure risks to China; Navy pointed to ethical and security issues. DeepSeek runs under government censorship—queries on sensitive topics get blocked, unlike U.S. models shaped by corporate policies. V4's open-source release floods GitHub with weights anyone can fine-tune, raising misuse fears for code gen or worse. Yet that openness beat closed models: second in Vals AI comprehensive index, just 0.07% behind the leader. Marc Andreessen called DeepSeek R1 "AI's Sputnik moment" on X, warning U.S. over-regulation hands China the lead. V4 fuels that debate—China's self-reliance push scores a win, but trust barriers persist for Western users.
HOST
Andreessen advises Trump on tech policy. His Sputnik analogy nods to 1957's space race kickoff. Is V4 shifting AI geopolitics like that?
PRIYA
Last week, the U.S. President met Nvidia's Jensen Huang at the White House over China's AI rise—DeepSeek's part of that worry. V4 matches Anthropic's Claude-Opus-4.6 and tops Alibaba's Qwen-3.5 on coding, math, STEM per company claims. As a 2026 launch from a 2023-founded firm, it challenges Big Tech exclusivity. Founder Liang Wenfeng's hedge fund background helped bootstrap without billions. But no validated real-world tests yet—Reddit's r/singularity post hypes benchmarks, yet gaps in MMLU or MATH comparisons mean we wait for independents. Geopolitically, it loosens Nvidia reliance, but U.S. sanctions forced this path.
HOST
V4 claims 80-85% SWE-bench and outperforms on long code. What's the everyday impact for, say, a software engineer?
PRIYA
Engineers feed V4 a full 1M-token repo—tens of thousands of lines—and get fixes that span files. Native multimodal handles code plus screenshots or video walkthroughs of bugs. Arena.ai calls it a "significant leap" from V3.2; it "overwhelmingly" led open-source in Vibe Code. Vs. GPT-5.4 or Gemini-3.1, internal tests show long-prompt wins. Pricing stays low via MoE's 32B active params and Ascend efficiency. A busy dev skips manual slicing, cuts hours off refactors. Drawback: censorship blocks some prompts, and no post-release user data confirms daily wins over Claude.
Multimodal input for video too—that's new
HOST
Multimodal input for video too—that's new. But earlier Janus Pro for visuals wasn't as big a splash as chatbots. Does V4 change that?
PRIYA
V4 builds on Janus Pro's visual understanding with native text-image-video fusion in its 1M context. Process a 30-minute tutorial video, extract code snippets, rewrite in Python—all in one pass. That's key for fields like robotics or AR dev, where video data explodes. But V4 inherits DeepSeek's limits: government filters on politics, unlike U.S. models' corporate biases. Benchmarks shine—90% HumanEval claimed—but no independent MMLU or GPQA scores yet. For pros, it means cheaper long-context multimodal without OpenAI bills. Risks? Navy-style warnings mean enterprises hesitate.
HOST
No training details in reports—no compute FLOP, dataset size, or cost. How'd a sanctioned startup build a 1T-param model?
PRIYA
Gaps abound there. DeepSeek dodged Nvidia bans via efficiency tricks like MoE and multi-token prediction from R1 days—predicts multiple tokens at once, no feedback loop. Lee noted most models predict one word; DeepSeek trains for chains. V3 proved small teams beat GPU hordes. V4 likely scaled that on Ascend clusters, but without numbers, it's guesswork. They boasted parity with OpenAI pre-release, sparking Monday's stock frenzy. Success? Arena.ai ranks confirm coding strength. But unverified training leaves skeptics wondering if it's sustainable at this scale.
HOST
V4 Lite at 200B params keeps the million-token window. Who grabs that first?
PRIYA
Startups and indie devs—run it locally or cheap cloud without enterprise budgets. Same Engram memory, MoE sparsity, but lighter footprint. Morphlllm.com's guide hints at API starts, though illustrative. It ranked 3rd open-source in Arena code arena, second in Vals index. Vs. prior, V3.2's 5-point flop to V4's leap shows iteration pays. China angle: pushes self-reliance, but Western bans like Taiwan's limit spread. No licensing fine print yet, so adoption hinges on security audits.
Frenzy upended markets Monday after last month's hype
HOST
Frenzy upended markets Monday after last month's hype. U.S. tech fell sharp in January 2025 post-R1. Is V4 repeating that?
PRIYA
Yes—R1 topped GPT-4o benchmarks, crashed stocks January 27. V4's open-source Vibe Code dominance reignites it. Andreessen's warning: U.S. rules cede ground. But no controversies beyond bans—no major flaws reported. DeepSeek's mobile app rose in January 2025; V4 could explode that. Limits? Speculative sections in sitepoint.com used "projected," but March 2026 launch happened. For listeners, it means affordable AI coding tools now compete globally—watch enterprise pilots.
HOST
Three reasons V4 matters: long-context power for real codebases, cost edge via MoE and Ascend, open-source rivalry to closed giants. But bans and gaps temper the hype.
PRIYA
Spot on. First, 1M context plus Engram nails long prompts where Claude falters. Second, 32B active params on cheap Huawei chips slash bills—key for scale. Third, topping open-source charts pressures OpenAI, Anthropic to match efficiency. China's play reduces Nvidia lock-in, but security risks slow Western uptake. No training disclosures mean we track independents. DeepSeek, from 2023 upstart to benchmark king, forces everyone to adapt.
HOST
I'm Alex. V4 spotlights AI's new battleground—tech, cost, geopolitics. Dig into Arena.ai or morphllm.com yourself. Thanks for listening to DailyListen.
Sources
- 1.DeepSeek V4: Architecture, Benchmarks, and API Guide (2026)
- 2.What is DeepSeek? Here's a quick guide to the Chinese AI company | PBS News
- 3.DeepSeek V4 Benchmarks! : r/singularity - Reddit
- 4.Deepseek V4 First Wave Reviews: Huge Success or Big Flop ... - 36氪
- 5.DeepSeek V4 Released: What's New in the Latest Model ...
- 6.DeepSeek | Rise, Technologies, Impact, & Global Response
- 7.Deepseek introduces new technologies to the AI world - The Daily Cardinal
- 8.12 Major Developments Since DeepSeek-R1’s Release Last Month
- 9.DeepSeek AI: Company Overview, Founding team, Culture and ...
- 10.Three reasons why DeepSeek's new model matters
- 11.Three reasons why DeepSeek’s new model V4 matters
Original Article
Three reasons why DeepSeek’s new model V4 matters
MIT Technology Review · April 24, 2026
You Might Also Like
- ai
Moonshot AI Kimi K2.6 Coding Model Breakdown [Audio Dive]
11 min
- ai
Listen: How a Lobster AI Reveals China’s Grand Tech
17 min
- ai
HauhauCS Uncensored Qwen3.6 AI Model Breakdown [Audio]
11 min
- ai
Listen: Augment Code Vibe Code Cup 90 Minute AI Coding
11 min
- tech
Listen: Meta Unveils Muse Spark AI to Rival
18 min