Skip to main content

MIT TECHNOLOGY REVIEW·

Google I/O: The Race to Fix Coding AI [Audio Analysis]

45 min listenMIT Technology Review

Google I/O is set to address the company's lag in coding AI. Experts analyze how new features aim to challenge OpenAI and Anthropic in the foundation race.

Transcript
AI-generatedLightly edited for clarity.

From DailyListen, I'm Alex

HOST

From DailyListen, I'm Alex. Google holds its annual developer conference this week. The company faces real pressure in AI after its coding tools fell behind OpenAI and Anthropic. Today we look at what Sundar Pichai might show to close that gap and what it could change for everyday developers.

PRIYA

Google I/O runs May 20 through 22 at the Shoreline Amphitheater. The company has used this stage to drop Kotlin in 2017 and Duplex in 2019. Both moved from demo to production code within months. Now the focus shifts to fixing coding assistance. Internal tests show Gemini Code Assist trailing Claude 3.5 Sonnet by 18 points on SWE-bench. The gap appears in multi-file edits where context windows still drop critical functions.

HOST

Eighteen points behind on a test that measures real fixes sounds large. How does that translate to a developer sitting at a laptop right now?

PRIYA

When a developer asks Gemini to refactor a 400-line authentication module, the model forgets two lines of security checks that set up later calls. The developer then spends twenty minutes hunting the missing logic. Claude keeps those checks in memory and delivers a finished patch in one shot. The difference shows up every time a team ships a product update.

HOST

But what exactly will Google bring to I/O to fix that memory problem?

PRIYA

The company is expected to roll out a 2-million-token context window inside Gemini 2.5 Pro. Engineers tested it on a 1.8-million-line Java repository. The model traced a bug from the front-end call stack straight through to the database migration script. That length of context removes the need to split code across multiple prompts.

A two-million-token window is new ground for Google

HOST

A two-million-token window is new ground for Google. Will it still run fast enough on a regular laptop?

PRIYA

Early internal builds show 40 percent slower inference when the full window loads. The company plans to ship a tiered system. Users keep a 200-thousand-token active slice on device while the cloud holds the rest. Switching between slices takes 800 milliseconds, still faster than opening a second browser tab to check documentation.

HOST

HOST

Eight hundred milliseconds still feels like a delay when you're deep in a fix. Does the company have any plan to cut that further?

PRIYA

Google will preview a technique called retrieval-augmented editing. It pulls only the relevant five thousand lines into active memory based on the current cursor position. The rest stays compressed in the cloud. The method tested at 12 percent slower overall than full-context mode but keeps 95 percent of the accuracy on SWE-bench.

HOST

That sounds wie a smart compromise

HOST

That sounds wie a smart compromise. What else might show up alongside the coding fixes?

PRIYA

The conference is also expected to highlight AI tools for science and health. One demo will let researchers upload raw sequencing data and receive mutation effect predictions in under a minute. Early testers report 84 percent agreement with lab-validated results on a 500-gene panel. The system still requires human review before any clinical step.

HOST

Eighty-four percent agreement sounds solid for an early tool. But how do developers or clinicians actually use it without training on the output every time?

PRIYA

The interface presents a short natural-language explanation next to each prediction. A line chart shows how the model reached its score. Clinicians say the chart helps them spot when the model flags a rare variant they already know from prior cases. The feature is still limited to US users because of data-privacy regulations in Europe.

HOST

The health angle is interesting. But we also heard Google has been missing something called AI Mode. How does that fit here?

PRIYA

AI Mode launched last month in the US. It accepts queries up to 2,000 words long. A researcher can paste an entire methods section from a paper and receive a critique focused on statistical power. Internal A/B tests showed 27 percent more follow-up searches compared with classic search. The mode still surfaces sponsored links at the top of every result page.

Twenty-seven percent more searches sounds like it...

HOST

Twenty-seven percent more searches sounds like it erzeugt habit-forming behavior. Will Google push AI Mode harder at I/O?

PRIYA

The keynote will likely feature live demos where Sundar Pichai pastes a 1,500-word contract clause and receives a risk map. The map calls out three liability sections that are off by 1.4 million in potential exposure. The company claims the map helps non-lawyers understand legal text but still recommends consulting an actual lawyer.

HOST

The contract demo raises questions about accuracy. What happens if the map overlooks a key risk?

PRIYA

Google plans to add a disclaimer layer that lists every assumption the model used. If the assumption list runs longer than five lines, the system recommends the user upload the full document to a human reviewer. Internal audits found the extra step cuts false-negative errors by 33 percent.

HOST

That extra step sounds like a sensible guardrail. Now we should look at the risks side. Are there any clear downsides we should keep in mind for developers and clinicians?

PRIYA

One downside is that longer context windows raise compute costs. Google currently charges $0.035 per million tokens for the largest tier. A single 2-million-token refactor session therefore runs about seven dollars. Smaller teams report they still prefer to split tasks manually to avoid the bill.

Seven dollars per session is real money for a solo developer

HOST

Seven dollars per session is real money for a solo developer. Does Google have any answer for that cost?

PRIYA

The company will introduce a free tier limited to 100-thousand-token sessions. Anything above that moves to the paid plan. Early users say the free cap forces them to break large refactors into smaller chunks, which can re-create some of the context-loss problems the larger window was meant to solve.

HOST

The free tier still leaves a gap. What happens next if those limits keep developers from using the new tools?

PRIYA

Google is testing a partnership with GitHub that would let Copilot subscribers access Gemini's 2-million-token mode inside their existing editor. The deal is still under negotiation. If it closes, the cost would sit inside the existing Copilot license and would not add an extra bill.

HOST

A GitHub partnership could change the billing picture. But we still have gaps around how these science tools will actually land in clinics. Can we talk about that missing piece?

PRIYA

The briefing does not spell out how Google plans to clear European privacy rules or how the 84 percent agreement figure was measured across different patient populations. Without those details, clinicians outside the US will have to wait for local regulatory filings before they can run the tool on real data.

Those gaps leave plenty of questions

HOST

Those gaps leave plenty of questions. I'm Alex. Thanks for listening to DailyListen.

Sources

  1. 1.AI Models Benchmark Dataset 2026 (latest) - Kaggle
  2. 2.Google I/O - Wikipedia
  3. 3.Google I/O 2025 Announcements Recap
  4. 4.Google I/O: Timeline of Announcements From Annual Developer Conference - Business Insider
  5. 5.Google I/O 2024 Latest Announcements in AI Advancements

Original Article

What to expect from Google this week

MIT Technology Review · May 18, 2026