OpenAI DevDay 2025: Codex, AgentKit, and the Dawn of AI That Builds AI

How OpenAI’s DevDay 2025 Redefined AI Development

Open AI Dev Day 2025

If you’ve ever felt like AI is evolving faster than your morning coffee cools, you’re not wrong. OpenAI’s DevDay 2025 was a masterclass in acceleration, unveiling tools that don’t just help developers build AI, they let AI build itself. Yes, really.

From Codex, the AI coding agent that wrote most of its own tools, to AgentKit, a drag-and-drop suite for building autonomous agents, OpenAI has officially entered the “AI builds AI” era. And if that sounds like science fiction, buckle up – it’s already live.

Let’s dive into what was announced, why it matters, and how it’s about to change everything from weekend hackathons to enterprise automation.

Dev Day Open AI
Dev Day Open AI

Codex: Your New AI Developer (Who Doesn’t Ask for a Pay Rise)

Codex has graduated from autocomplete to full-blown co-worker. It now:

  • Writes code, runs tests, and reviews pull requests
  • Integrates with Slack, just tag @Codex and it’ll pick up context and complete tasks
  • Works across IDEs, terminals, and cloud containers
  • Builds entire features autonomously

During DevDay, OpenAI revealed that Codex wrote 80% of the Agent Builder tool in under six weeks. That’s not just impressive, it’s a glimpse into a future where AI builds the tools that build better AI.

Codex is powered by GPT-5-Codex, a model optimised for agentic coding. It dynamically adjusts its “thinking time” based on task complexity, sometimes working for hours on a single problem. It’s like having a junior developer who never sleeps, never complains, and always commits clean code.

Real-world impact? OpenAI engineers now complete 70% more pull requests per week, and companies like Cisco have cut code review times by 50% using Codex.

AgentKit: The Swiss Army Knife for Building AI Agents

If Codex is your AI developer, AgentKit is your AI factory. It’s a modular toolkit designed to help developers and enterprises build, deploy, and optimise AI agents—without the usual chaos of fragmented tools.

What’s Inside AgentKit?

  1. Agent Builder
    A visual canvas (think Canva for agents) where you drag-and-drop logic, tools, and workflows. It supports versioning, preview runs, and inline evaluations.
  2. ChatKit
    Embeddable chat interfaces that feel native to your product. Canva built a support agent in under an hour using ChatKit.
  3. Connector Registry
    A central admin panel to manage secure connections to tools like Dropbox, Google Drive, SharePoint, and Microsoft Teams.
  4. Evals for Agents
    Performance evaluation tools including:
    • Datasets
    • Trace grading
    • Automated prompt optimisation
    • Third-party model support

Ramp built a procurement agent in just a few hours. LY Corporation orchestrated a multi-agent workflow in under two. Klarna’s support agent now handles two-thirds of all tickets.

Agent Builder: Drag, Drop, Deploy

Before AgentKit, building agents meant juggling orchestration scripts, custom connectors, manual prompt tuning, and frontend work. Now, you can:

  • Start with a blank canvas or prebuilt templates
  • Collaborate across product, legal, and engineering
  • Slash iteration cycles by 70%
  • Go from idea to live agent in two sprints instead of two quarters

It’s not just faster, it’s collaborative, visual, and version-controlled. And yes, it’s still in beta, but it’s already transforming workflows.

ChatKit: Embedding Conversational Agents Made Easy

Deploying chat UIs used to be a nightmare-streaming responses, managing threads, designing UI. ChatKit fixes that.

  • Embed chat agents into apps or websites
  • Customise branding and workflows
  • Handle complex interactions with ease

HubSpot, Canva, and others are already using ChatKit to power support, onboarding, and research agents. It’s fast, flexible, and built for scale.

Evals: Because Performance Matters

Building agents is one thing. Making sure they work reliably is another. That’s where Evals for Agents comes in.

  • Datasets: Build and expand evals with automated graders
  • Trace grading: Assess workflows step-by-step
  • Prompt optimisation: Improve prompts based on feedback
  • Third-party model support: Evaluate models beyond OpenAI

Carlyle used Evals to cut development time by 50% and boost agent accuracy by 30%.

Reinforcement Fine-Tuning: Teaching AI to Think Better

OpenAI also introduced Reinforcement Fine-Tuning (RFT) for its o4-mini and GPT-5 models. This lets developers train models using custom reward functions, not just labelled data.

Why it matters:

  • Tailor models for nuanced tasks (e.g. legal reasoning, medical coding)
  • Use programmable graders to score outputs
  • Optimise for clarity, correctness, and domain-specific behaviour

Early adopters like Accordance AI and Ambience Healthcare have seen accuracy improvements of up to 39%.

Apps SDK: ChatGPT Becomes an Operating System

OpenAI didn’t stop at agents. They also launched the Apps SDK, turning ChatGPT into a full-blown app platform.

What You Can Do

  • Build apps that live inside ChatGPT
  • Use natural language to trigger actions (e.g. “Spotify, make a playlist”)
  • Embed interactive UIs like maps, video players, and forms
  • Connect to existing backends and user accounts

Launch partners include Spotify, Canva, Coursera, Zillow, Booking.com, and Expedia. More are coming soon, including Uber, DoorDash, and Target.

Apps can appear:

  • Inline in chat
  • Fullscreen for complex tasks
  • Picture-in-picture for continuous engagement

It’s a seamless, conversational experience and already live for 800 million ChatGPT users.

What This Means for You

Developers

You’re no longer just coding, you’re designing systems. Codex is your pair programmer. AgentKit is your toolkit. ChatKit handles the frontend. You can go from idea to production in a weekend.

Founders & Creators

Prototyping AI-native products used to take months. Now it’s a hackathon. Visualise workflows, test them, deploy with no backend required.

Professionals

Think of your job as a system. Researching, reporting and managing these are workflows you can automate. Build an agent to do the work for you.

AI Building AI: The Feedback Loop Has Begun

Here’s the big picture: AI is now building the tools that build better AI. This feedback loop is going to compound innovation at a rate we’ve never seen before.

Codex wrote 80% of Agent Builder. Agent Builder lets you build agents that use Codex. The cycle is self-reinforcing—and it’s already live.

Pricing & Availability

  • Codex: Included with ChatGPT Plus, Pro, Team, Edu, and Enterprise
  • AgentKit: Standard API model pricing
  • Apps SDK: Available in preview; monetisation details coming soon
  • RFT: Available for verified organisations using o4-mini and GPT-5

Final Thoughts: The Future Is Drag-and-Drop

OpenAI’s DevDay 2025 wasn’t just a product showcase—it was a paradigm shift. We’re entering a world where building intelligent systems is as simple as dragging blocks on a canvas. Where AI agents can be embedded, evaluated, and optimised in hours. Where your co-worker might be Codex, and your product manager might be a workflow.

So whether you’re a developer, founder, or just someone curious about the future of tech, now’s the time to dive in. Play with Agent Builder. Embed ChatKit. Run Evals. The tools are here. The future is building itself.

And if you’re still wondering whether AI is moving too fast… well, it’s already writing its own tools. You might want to keep up.