Advancing voice intelligence with new models in the API

先看结论：Explore new realtime voice models in the OpenAI API that can reason, translate, and transcribe speech, enabling more natural and intelligent

We’re introducing three audio models in the API that unlock a new class of voice apps for developers.

核心内容

With these models, developers can build voice experiences that feel more natural, respond more intelligently, and take action in real time: - GPT‑Realtime‑2, our first voice model with GPT‑5‑class reasoning that can handle harder requests and carry the conversation forward naturally.

- GPT‑Realtime‑Translate, a new live translation model that translates speech from 70+ input languages into 13 output languages while keeping pace with the speaker.

- GPT‑Realtime‑Whisper, a new streaming speech-to-text that transcribes speech live as the speaker talks.

Try GPT-Realtime-2 What can I ask?

After you start the session, try saying one of these: - I’m hosting a last-minute dinner tonight.

I have 30 minutes, two vegetarian friends, one mushroom-hater, and a tiny kitchen.

Help me plan a simple menu.

- I’m welcoming guests to a live event in Japan.

Say a warm, natural welcome in Japanese — like a host kicking off something special.

- My order number is Orbit-742Q.

Repeat it back clearly so I can confirm it’s right.

- Help me practice telling my team we hit our launch milestone.

First say it with quiet confidence, then with more excitement.

- I’m planning trivia for a road trip.

Give me three trick questions that sound deceivingly simple, then explain each answer in one sentence.

延伸阅读：如果你想继续找可转化的工具入口，可以去工具合集和副业赚钱继续看。

进入 AI 工具导航页查看更多 AI 聊天