OpenAI Releases GPT-5.5: Faster Reasoning, Smarter Coding, and a New Era of AI Agents
OpenAI has officially launched GPT-5.5, its most powerful language model to date, just six weeks after releasing GPT-5.4. The new model sets a new benchmark across coding, agentic task execution, scientific reasoning, and knowledge work — cementing OpenAI’s position at the top of a rapidly accelerating AI race.
The announcement came on April 23, 2026, with GPT-5.5 rolling out immediately to ChatGPT Plus, Pro, Business, and Enterprise subscribers. OpenAI also announced that GPT-5.5 now powers Codex, its AI coding agent, marking a significant upgrade for developers who rely on automated code generation and debugging.
What Is GPT-5.5 and Why Does It Matter?
GPT-5.5 is OpenAI’s latest iteration in the GPT-5 family, described internally by the codename “Spud.” But there’s nothing plain about what it can do. OpenAI describes it as “a new class of intelligence for real work” — and the benchmarks back that up.
Unlike previous incremental updates, GPT-5.5 shows meaningful improvements across nearly every evaluation category. It was built and served on NVIDIA GB200 NVL72 systems, which gave OpenAI the infrastructure to scale the model’s reasoning capabilities without sacrificing speed or token efficiency.
What makes this release particularly notable is the speed of iteration. Six weeks between major model releases is an extraordinarily fast turnaround — even by the aggressive standards of today’s AI industry. It signals that OpenAI is in a full-sprint development cycle, likely in response to growing competitive pressure from Anthropic, Google DeepMind, and Meta.
Key Features and Improvements in GPT-5.5
Superior Coding Performance
Coding is where GPT-5.5 shines brightest. On Terminal-Bench 2.0 — one of the most demanding real-world coding evaluations available — GPT-5.5 scored 82.7%, a 7.6 percentage point improvement over GPT-5.4. This benchmark tests an AI’s ability to handle complete development workflows inside a terminal environment, not just isolated code snippets.
According to OpenAI, GPT-5.5 finishes Codex tasks in fewer tokens with fewer retries. That means developers get faster results with less back-and-forth. Teams using Codex are already reporting that they can ship end-to-end features from natural language prompts, cut debugging time from days to hours, and compress weeks of experimentation into overnight progress on complex codebases.
GPT-5.5 also narrowly edges out Anthropic’s Claude Mythos Preview on Terminal-Bench 2.0, which is significant given that Claude Mythos was widely considered the strongest coding model before this release.
Breakthrough Mathematical Reasoning
On FrontierMath Tier 4 — arguably the most difficult mathematics benchmark in existence — GPT-5.5 Pro scored 39.6%. That is nearly double the 22.9% scored by Claude Opus 4.7, the previous leader on this benchmark. FrontierMath Tier 4 consists of problems that typically require PhD-level mathematical expertise, including novel proofs and multi-step derivations that cannot be solved by pattern matching alone.
This leap in mathematical reasoning has direct implications for scientific research. OpenAI specifically called out early-stage scientific research as one of GPT-5.5’s key strengths — and the model also shows clear improvements on GeneBench, an evaluation focused on multi-stage scientific data analysis in genetics and quantitative biology.
Agentic Task Execution
One of the most important upgrades in GPT-5.5 is its ability to handle complex multi-step workflows more autonomously. OpenAI describes this as “agentic” behavior — the model can plan, use tools, check its own work, and course-correct across long task sequences with minimal human input.
On MCP Atlas, a benchmark designed specifically to evaluate multi-step agentic task completion, GPT-5.5 improved by 8.1 percentage points over GPT-5.4. On ARC-AGI-2, which tests adaptive general reasoning that cannot be memorized from training data, the improvement was 11.7 percentage points — the largest single gain across all benchmarks.
The practical implication is that GPT-5.5 can now take on more complex projects end-to-end. Rather than completing one step and waiting for user confirmation, it can independently manage research pipelines, write and test code, browse the web for information, and produce final deliverables — all within a single session.
Knowledge Work at Scale
On GDPval — a benchmark that evaluates AI across 44 professional occupations including legal research, financial modeling, medical documentation, and software engineering — GPT-5.5 scored 84.9%, up from GPT-5.4’s 83.0%. That difference may seem modest on paper, but across thousands of professional use cases, it adds up to significantly more reliable output.
OpenAI says the model intuits what you need before you ask — meaning it is better at understanding intent from ambiguous or underspecified prompts, which is exactly what professionals need when using AI as a daily work tool.
GPT-5.5 vs GPT-5.4: What Actually Changed?
GPT-5.5 improves on 9 out of 10 shared benchmarks compared to GPT-5.4. The biggest gains are in reasoning-heavy domains: ARC-AGI-2 (+11.7pp), MCP Atlas (+8.1pp), and Terminal-Bench 2.0 (+7.6pp). Knowledge work benchmarks like GDPval saw smaller but consistent improvements.
Speed-wise, GPT-5.5 matches GPT-5.4’s per-token latency in real-world serving despite being a more capable model. However, it comes at a higher price — $5 per million input tokens and $30 per million output tokens, compared to $2.50 and $15 for GPT-5.4. OpenAI is betting that the performance improvements justify the 2x cost increase for users who need the best results.
For developers using Codex, the efficiency gains may offset the higher token costs. If the model completes tasks in fewer tokens with fewer retries, the total cost per completed task could be similar to or lower than GPT-5.4.
Availability and Access
GPT-5.5 is now rolling out to ChatGPT Plus, Pro, Business, and Enterprise users. GPT-5.5 Pro is available to Pro, Business, and Enterprise subscribers. API access is available immediately for developers.
Codex has also been upgraded to run on GPT-5.5 — significant for software development teams that have built workflows around the agent. The underlying intelligence is now substantially better at reasoning, planning, and completing complex tasks.
The model is built and served on NVIDIA GB200 NVL72 systems, enabling teams to ship end-to-end features from natural language prompts and dramatically accelerate development cycles.
Safety and Safeguards
OpenAI says GPT-5.5 ships with its strongest set of safeguards to date. This release comes just days after the U.S. Department of Homeland Security demonstrated to Congress how jailbroken AI models can generate dangerous content — adding pressure on all major AI labs to tighten their safety practices.
The model performs better at detecting and declining harmful requests, while being less likely to over-refuse legitimate queries — a balance that previous models have consistently struggled to achieve.
What This Means for the AI Industry
The release of GPT-5.5 just six weeks after GPT-5.4 signals something important: development cycles are compressing dramatically. What used to take a year now takes weeks. This pace is sustainable only because AI labs are now using AI to build AI — OpenAI confirmed that Codex played a significant role in GPT-5.5’s own development.
Anthropic’s annualized revenue is approaching $19 billion, while OpenAI has surpassed $25 billion. Both companies are racing toward capabilities that could redefine how knowledge work, software development, and scientific research are conducted. Google DeepMind and Meta are not far behind.
For everyday users, the implication is clear: AI tools are getting better faster than most people are adapting to them. GPT-5.5 is not just a marginal upgrade — it is a model that can genuinely operate as a semi-autonomous agent across real professional workflows. The question is no longer whether AI can do complex work, but how quickly organizations will leverage that capability.
Final Thoughts
GPT-5.5 is a significant step forward in AI capability, particularly for coding, scientific reasoning, and autonomous task execution. The FrontierMath and Terminal-Bench results are especially impressive and will likely force competitors to accelerate their own release timelines.
For developers, researchers, and professionals who rely on AI daily, GPT-5.5 represents a genuine productivity upgrade. The higher price point is a real consideration, but for intensive use cases the efficiency gains appear to justify the cost.
As OpenAI moves toward a potential IPO and the broader AI industry approaches what many researchers describe as an inflection point in capability, GPT-5.5 is a clear signal that the pace of progress is not slowing down. If anything, it is accelerating.
The release follows months of speculation about OpenAI’s next model milestone. According to OpenAI’s official research page, the company continues to push boundaries in reasoning and multimodal capability with each successive model generation. OpenAI GPT-5.5 represents one of the most significant leaps in this trajectory.
Related coverage: OpenAI Hits $25 Billion Revenue and Is Heading for a $1 Trillion IPO — the financial story behind the model releases. Also read: Microsoft and Claude Are Quietly Winning the Enterprise Security Coding Race.