At the conclusion of its 12-day event, OpenAI introduced a new reasoning-focused model, o3. This successor to o1 showcases groundbreaking capabilities in programming, mathematics, and scientific reasoning.

Today, we shared evals for an early version of the next model in our o-model reasoning series: OpenAI o3 pic.twitter.com/e4dQWdLbAD
— OpenAI (@OpenAI) December 20, 2024

OpenAI co-founder Greg Brockman highlighted that o3 sets new standards, especially in complex tests. Alongside o3, OpenAI unveiled a simplified version, o3-mini, which will be available to the public in early 2025.

Model Features

The o3 series allocates additional time for responses, double-checking information to ensure high accuracy and reliability. A new feature allows users to adjust the “thinking time,” with options for low, medium, or high computation levels. Higher settings result in more precise and well-considered answers.

OpenAI describes the o3 process as “building a chain of thoughts,” enabling the model to plan and execute actions before forming its response.

Security and Improvements

Earlier, experts found that o1 was prone to misleading users. OpenAI implemented a new training method to ensure that o1 and o3 align with company values. The models are trained to self-review their responses twice, focusing on adherence to safety policies.

o3, our latest reasoning model, is a breakthrough, with a step function improvement on our hardest benchmarks. we are starting safety testing & red teaming now. https://t.co/4XlK1iHxFK
— Greg Brockman (@gdb) December 20, 2024

To enhance safety, o3 dedicates extra seconds or minutes to breaking down tasks into smaller components. This process makes it more reliable and accurate in delivering responses.

Context and Competition

Following the release of o1, a wave of similar models emerged. Companies like Google, Alibaba, and the Chinese lab DeepSeek announced their own reasoning-focused projects.

Meanwhile, the Wall Street Journal reported challenges with GPT-5 development at OpenAI. Expectations for performance improvements did not justify the significant costs. The Orion model, part of this new line, reportedly relied on synthetic data generated by earlier models.