advertisement
OpenAI Debuts New GPT Model
OpenAI, a research organization in artificial intelligence, recently announced GPT-4o, a new flagship model. The letter “o” stands for “Omni,” referring to the model’s ability to handle text, speech, and video.
The launch was announced by OpenAI CTO, Mira Murati, on Monday, May 13, during a live-streamed demo and released the same day. The omnimodal generative pre-trained transformer (GPT) model represents an advancement from the previous iteration, ChatGPT 4 Turbo. This series of GPT advances signifies the evolution from earlier models like GPT-3, GPT-3.5, and GPT-4 Turbo.
While GPT-4o performs similarly to GPT-4 Turbo in English text and code, it exhibits notable improvements, especially in non-English languages. Additionally, it offers faster performance and is available at a 50 percent reduced cost through the API. Notably, when compared to its predecessors, GPT-4o demonstrates greater vision and audio comprehension capabilities.
advertisement
Murati stated that GPT-4o provides “GPT-4-level” intelligence but enhances capabilities across multiple modalities and media. The recent model excels in various innovative capabilities, including visual narratives, character design, poetic typographies, photo caricatures, lecture summarization, concrete poetry, and 3D object synthesis, among others. GPT-4 Turbo, the previous leading and most advanced model from OpenAI, was trained on a combination of text and images. However, GPT-4o incorporates speech into its capabilities. Murati emphasized that while these models are becoming more complex, the interaction experience should become more natural and easier.
GPT-4o’s availability extends to developers and users through the API, offering enhanced speed, affordability, and usage limits compared to its predecessors. Although the platform has long included a voice option, GPT-4o enhances this feature, making ChatGPT feel more like an assistant. For example, users can ask questions and interrupt ChatGPT while it’s responding. The model delivers real-time responsiveness and can pick up on nuances in a user’s voice, generating voices in various emotive styles, including singing.
According to Murati, these qualities will continue to develop in the future. The model might enable ChatGPT to perform tasks like watching live sporting events and providing regulations. GPT-4o has improved performance in almost 50 languages, making it more multilingual. Additionally, it is said to be twice as fast, half as expensive, and with greater rate limits than GPT-4 Turbo in OpenAI’s API and Microsoft’s Azure OpenAI Service.
advertisement
To address safety concerns, not all clients can use voice with the GPT-4o API. Citing the possibility of abuse, OpenAI intends to initially make GPT-4o’s new audio capabilities available to a small group of trusted partners in the upcoming weeks.
The text and image capabilities of GPT-4o have been rolled out within ChatGPT. OpenAI announces the availability of GPT-4o in the free tier, extending its functionality to Plus users with up to 5 times higher message limits. Additionally, plans are underway to introduce a new version of Voice Mode featuring GPT-4o in alpha within ChatGPT Plus in the forthcoming weeks.