How OpenAI’s DevDay 2025 Redefined AI Development
If you’ve ever felt like AI is evolving faster than your morning coffee cools, you’re not wrong. OpenAI’s DevDay 2025 was a masterclass in acceleration, unveiling tools that don’t just help developers build AI, they let AI build itself. Yes, really.
From Codex, the AI coding agent that wrote most of its own tools, to AgentKit, a drag-and-drop suite for building autonomous agents, OpenAI has officially entered the “AI builds AI” era. And if that sounds like science fiction, buckle up – it’s already live.
Let’s dive into what was announced, why it matters, and how it’s about to change everything from weekend hackathons to enterprise automation.

Codex: Your New AI Developer (Who Doesn’t Ask for a Pay Rise)
Codex has graduated from autocomplete to full-blown co-worker. It now:
- Writes code, runs tests, and reviews pull requests
- Integrates with Slack, just tag @Codex and it’ll pick up context and complete tasks
- Works across IDEs, terminals, and cloud containers
- Builds entire features autonomously
During DevDay, OpenAI revealed that Codex wrote 80% of the Agent Builder tool in under six weeks. That’s not just impressive, it’s a glimpse into a future where AI builds the tools that build better AI.
Codex is powered by GPT-5-Codex, a model optimised for agentic coding. It dynamically adjusts its “thinking time” based on task complexity, sometimes working for hours on a single problem. It’s like having a junior developer who never sleeps, never complains, and always commits clean code.
Real-world impact? OpenAI engineers now complete 70% more pull requests per week, and companies like Cisco have cut code review times by 50% using Codex.
AgentKit: The Swiss Army Knife for Building AI Agents
If Codex is your AI developer, AgentKit is your AI factory. It’s a modular toolkit designed to help developers and enterprises build, deploy, and optimise AI agents—without the usual chaos of fragmented tools.
What’s Inside AgentKit?
- Agent Builder
A visual canvas (think Canva for agents) where you drag-and-drop logic, tools, and workflows. It supports versioning, preview runs, and inline evaluations. - ChatKit
Embeddable chat interfaces that feel native to your product. Canva built a support agent in under an hour using ChatKit. - Connector Registry
A central admin panel to manage secure connections to tools like Dropbox, Google Drive, SharePoint, and Microsoft Teams. - Evals for Agents
Performance evaluation tools including:
- Datasets
- Trace grading
- Automated prompt optimisation
- Third-party model support
Ramp built a procurement agent in just a few hours. LY Corporation orchestrated a multi-agent workflow in under two. Klarna’s support agent now handles two-thirds of all tickets.
Agent Builder: Drag, Drop, Deploy
Before AgentKit, building agents meant juggling orchestration scripts, custom connectors, manual prompt tuning, and frontend work. Now, you can:
- Start with a blank canvas or prebuilt templates
- Collaborate across product, legal, and engineering
- Slash iteration cycles by 70%
- Go from idea to live agent in two sprints instead of two quarters
It’s not just faster, it’s collaborative, visual, and version-controlled. And yes, it’s still in beta, but it’s already transforming workflows.
ChatKit: Embedding Conversational Agents Made Easy
Deploying chat UIs used to be a nightmare-streaming responses, managing threads, designing UI. ChatKit fixes that.
- Embed chat agents into apps or websites
- Customise branding and workflows
- Handle complex interactions with ease
HubSpot, Canva, and others are already using ChatKit to power support, onboarding, and research agents. It’s fast, flexible, and built for scale.
Evals: Because Performance Matters
Building agents is one thing. Making sure they work reliably is another. That’s where Evals for Agents comes in.
- Datasets: Build and expand evals with automated graders
- Trace grading: Assess workflows step-by-step
- Prompt optimisation: Improve prompts based on feedback
- Third-party model support: Evaluate models beyond OpenAI
Carlyle used Evals to cut development time by 50% and boost agent accuracy by 30%.
Reinforcement Fine-Tuning: Teaching AI to Think Better
OpenAI also introduced Reinforcement Fine-Tuning (RFT) for its o4-mini and GPT-5 models. This lets developers train models using custom reward functions, not just labelled data.
Why it matters:
- Tailor models for nuanced tasks (e.g. legal reasoning, medical coding)
- Use programmable graders to score outputs
- Optimise for clarity, correctness, and domain-specific behaviour
Early adopters like Accordance AI and Ambience Healthcare have seen accuracy improvements of up to 39%.
Apps SDK: ChatGPT Becomes an Operating System
OpenAI didn’t stop at agents. They also launched the Apps SDK, turning ChatGPT into a full-blown app platform.
What You Can Do
- Build apps that live inside ChatGPT
- Use natural language to trigger actions (e.g. “Spotify, make a playlist”)
- Embed interactive UIs like maps, video players, and forms
- Connect to existing backends and user accounts
Launch partners include Spotify, Canva, Coursera, Zillow, Booking.com, and Expedia. More are coming soon, including Uber, DoorDash, and Target.
Apps can appear:
- Inline in chat
- Fullscreen for complex tasks
- Picture-in-picture for continuous engagement
It’s a seamless, conversational experience and already live for 800 million ChatGPT users.
What This Means for You
Developers
You’re no longer just coding, you’re designing systems. Codex is your pair programmer. AgentKit is your toolkit. ChatKit handles the frontend. You can go from idea to production in a weekend.
Founders & Creators
Prototyping AI-native products used to take months. Now it’s a hackathon. Visualise workflows, test them, deploy with no backend required.
Professionals
Think of your job as a system. Researching, reporting and managing these are workflows you can automate. Build an agent to do the work for you.
AI Building AI: The Feedback Loop Has Begun
Here’s the big picture: AI is now building the tools that build better AI. This feedback loop is going to compound innovation at a rate we’ve never seen before.
Codex wrote 80% of Agent Builder. Agent Builder lets you build agents that use Codex. The cycle is self-reinforcing—and it’s already live.
Pricing & Availability
- Codex: Included with ChatGPT Plus, Pro, Team, Edu, and Enterprise
- AgentKit: Standard API model pricing
- Apps SDK: Available in preview; monetisation details coming soon
- RFT: Available for verified organisations using o4-mini and GPT-5
Final Thoughts: The Future Is Drag-and-Drop
OpenAI’s DevDay 2025 wasn’t just a product showcase—it was a paradigm shift. We’re entering a world where building intelligent systems is as simple as dragging blocks on a canvas. Where AI agents can be embedded, evaluated, and optimised in hours. Where your co-worker might be Codex, and your product manager might be a workflow.
So whether you’re a developer, founder, or just someone curious about the future of tech, now’s the time to dive in. Play with Agent Builder. Embed ChatKit. Run Evals. The tools are here. The future is building itself.
And if you’re still wondering whether AI is moving too fast… well, it’s already writing its own tools. You might want to keep up.