GPT-4o Model is launched : Is the real version of Her coming?
The two-day event will see the most advanced AI products go head-to-head.
At 1 a.m. today, ahead of the opening of Google I/O, Google's annual developer conference, OpenAI hosted a spring online livestream to announce the launch of a desktop version of ChatGPT and the unveiling of its new flagship AI model, GPT-4o. GPT-4o is available at no cost and is accessible to all users. It enables real-time inference across text, audio, and vision (images and video), with an API. Pricing is half that of GPT-4 Turbo and up to two times faster than GPT-4 Turbo. Paid ChatGPT Plus subscribers will receive five times the call credits and the earliest access to the new macOS desktop app and next-generation voice and video features.
OpenAI's enhancements to the AI chatbot ChatGPT are still "straight from the heart", with a natural and seamless real-time voice translation capability that rivals that of simultaneous interpreters. In addition to its speed and accuracy, the AI chatbot ChatGPT can also alter its tone of voice at will, from detached and impersonal to warm and engaging, and it can even sing a song at any time, sounding indistinguishable from a human. Furthermore, GPT-4o is now capable of real-time video interaction. For instance, it is able to comprehend linear equations through video images, and it has also learned to "read faces," interpreting and evaluating people's emotions based on their facial expressions and tone of voice.
Furthermore, it is capable of directly viewing your screen and providing responses based on its observations. For instance, when a piece of code is displayed, the system will identify and highlight any errors, or interpret the information presented in a data chart. The launch was conducted in a highly efficient manner, with a duration of approximately half an hour. This was accompanied by the demonstration of a multitude of Apple devices, which suggests that OpenAI's collaboration with Apple is imminent. The new features are available to both free and paid users. The beta phase, which commenced today, is limited to ChatGPT Plus users and will be made available to a wider range of users in the coming weeks. The text and image input features are being rolled out today, with voice and video functionality scheduled for release in the coming weeks.
The 'o' in GPT-4o stands for 'omni', according to Murati. He added that GPT-4o provides GPT-4-level intelligence to every user, while also improving on GPT-4's text, visual and audio capabilities. Previously, GPT-4 was trained on image and text data, enabling it to analyse images and text to extract text from them or describe the content of a screen, among other capabilities. Furthermore, GPT-4o incorporates voice capabilities, enhancing the user experience with ChatGPT to a level comparable to that of a human-to-human interaction. GPT-4o's performance is on par with that of the GPT-4 Turbo for English text and code, and significantly outperforms it in non-English text. English text with a notable enhancement in performance.
Murati stated that the release of GPT-4o signifies a significant advancement in the usability of large models, which are transforming the collaborative landscape of human-computer interaction. She discussed the challenges of developing AI that can effectively process human interactions, particularly in the context of multiple voices, background noise, and intonation.
She also noted that one of OpenAI's key objectives is to make advanced AI tools accessible to all. OpenAI will be launching a desktop version of ChatGPT, which can be integrated into a user's workflow with ease. In addition, OpenAI has updated the user interface to enhance the user experience and facilitate collaboration with ChatGPT.