OpenAI Unveils GPT-4o: A Step Closer to Human-Like AI Interaction

At the OpenAI presentation in San Francisco, a new version of the generative artificial intelligence language model, GPT-4o, was introduced. Developers tout it as a leap forward towards more natural human-computer interaction. This model can process any combination of text, audio, and visual data, generating similar combinations in responses. And most importantly, in its interactions, GPT-4o now more closely resembles a human.

OpenAI

The presentation primarily showcased the voice capabilities of the new model. While these capabilities existed before, the response delay has now significantly decreased, averaging at 320 milliseconds, comparable to human reaction speed (in previous versions of GPT, this metric varied from 2.8 to 5.4 seconds). Despite developers frequently interrupting ChatGPT during the demonstration, it had no impact on the quality of responses.

Programmer Robert Lukoshko noted that most of GPT-4o's responses start with introductory words, speculating that another, simpler model reproduces them while the new version prepares a full response. Thus, developers not only created the appearance of instant response but also brought GPT-4o closer to the model of real human communication. However, Lukoshko soon changed his opinion after watching a video where two models sing, continuing each other's phrases.

Indeed, GPT-4o can sing, change voice intonation (the chatbot deliberately makes it more dramatic or robotic upon request), and recognize user emotions. It can also analyze visual information. During the presentation, the model demonstrated its ability to read an equation written on paper through a smartphone camera and provide hints for its solution, correcting the user if they propose incorrect solutions.

The model can also function as a translator for foreign languages. The company's technical director, Mira Murati, conversed with one of the developers in Italian, and he responded in English. ChatGPT recognized these phrases and immediately translated them into the required language.

All these features, combined with the updated interface, resemble the futuristic movie "Her," where Joaquin Phoenix's character falls in love with an AI voiced by Scarlett Johansson, as noted by The Verge. This similarity was hinted at by OpenAI CEO Sam Altman (who did not appear at the presentation) in a succinct tweet with the original movie title.

Additionally, the creators showcased a new ChatGPT application for macOS, which allows users not only to communicate with the voice assistant but also to display information on the screen by pressing a specific key combination. During the presentation, the model not only recognized the code on the screen and explained what it does but also explained the meaning of one of the functions.

This functionality extends beyond programming. When GPT-4o was shown several temperature graphs by month, it was able to analyze them, describe them, and answer follow-up questions. The official press release suggests that in the future, communication with ChatGPT will become even more natural. For example, a user could show a live broadcast of a sports game and ask for an explanation of the rules.

Users with a ChatGPT Plus subscription have already begun to access the application, with a wider release planned in the coming weeks. A version of the application for Windows is expected later this year.

The GPT-4o model itself will also be distributed for free, but paid subscribers will have slightly more capabilities. ChatGPT already operates on the new GPT-4o model, but for now, it only applies to text and graphics. Voice capabilities will be available soon to a limited number of users, as OpenAI plans to roll out all new features gradually.

Early users who have explored the capabilities of GPT-4o describe them as nothing short of "mind-blowing" (in a positive sense). For example, working with graphs and data visualization now takes less than 30 seconds.

Meanwhile, until voice features are available, there's room for jest about the decline in the stock of the language learning platform Duolingo, which occurred shortly after the launch of GPT-4o.

Next
Next

Waymo to Launch Paid Robotaxi Service in Los Angeles