GPT-4o by OpenAI: Key Features, Updates & All You Need to…

Today, OpenAI hosted a live Spring Update to announce their latest ChatGPT model, GPT-4o. In this guide, we’re going to be explaining all you need to know about the update, including the brand new features GPT-4o has to offer and when we can expect it to roll out.

open ai launches gpt-4o — OpenAi's Spring Update Livestream

OpenAi’s Spring Update: Key Announcements

Mira Murati, Chief Technology Officer at OpenAI, started the presentation by reiterating OpenAI's mission to make the most advanced AI tools more freely available to everyone.

This led into the first update: today OpenAI is launching its desktop app to allow users to use it wherever they are. The app is designed to be simple-to-use, integrating seamlessly into user workflows.

Along with it, they have refreshed the user interface (UI). While the model itself is becoming more complex, OpenAI want the experience of interacting to become more easy and natural, allowing users to focus on their collaboration with ChatGPT.

The spotlight of today's event was of course the announcement of GPT-4o, OpenAI's new flagship model, which will be available to all users in the coming weeks.

What is GPT-4o?

ChatGPT-4o is designed to bring GPT-4 level intelligence to everyone, for free.

The new model boasts all new capabilities across text, vision and audio, making it a huge leap in AI technology. As well new capabilities, GPT-4o is built to be much more natural.

For the last few years OpenAI has focused on developing the chatbot’s intelligence. Now, their focus has shifted to improving the models’ ease of use.

“This is the future of interaction between ourselves and machines. GPT-4o is shifting that paradigm to the future of collaboration where it is much easier and more natural.” - Mira Murati

According to Murati, the process has been a complex one as unlike humans, AI has always struggled to naturally interpret things we often take for granted, such as our dialogue, background noise, multiple voices in a conversation and understanding tone of voice.

That’s why previous models have only ever offered very basic versions of ‘Voice Mode’, including transcription and text to speech. However, with relatively underdeveloped technology, this brought a lot of latency, breaking the immersion in the collaboration with ChatGPT.

Now, GPT-4o is able to reason across voice, text and vision in real time. This means the chatbot can process and respond to mixed media queries, whether they come through spoken words, written text, or visual inputs, seamlessly.

With real-time responsiveness, this model dramatically reduces latency, making conversations with AI feel more like chatting with a human. These improvements ensure that responses are not only swift but also contextually aware- even accommodating rapid shifts in the direction of the conversation without losing the thread.

To demonstrate these capabilities live, Mira Murati welcomed two of OpenAI’s research leads, Mark Chen and Barret Zoph.

GPT-4o Voice Capabilities

Mark Chen introduced the live demo on audio capabilities, showing how the real-time conversational speech works.

To begin, Mark asked: “Hey ChatGPt, I’m Mark, how are you?”

Without hesitation, ChatGPT responded “Hi Mark! I’m doing great! How are you?” in a lively tone not dissimilar to that of a human.

He then explained that he was conducting a live demo and asked for help to calm his nerves. GPT said: “Oh… you’re doing a demo right now? That’s awesome! Just take a deep breath and remember, you’re the expert here.”

To really put the model to the test, Mark asked for feedback on his breathing, which he did quickly and erratically. GPT instantly picked up on this and even cracked a joke before guiding his breathing further. “Woah! Slow down a bit there, Mark. You’re not a vacuum cleaner! Breathe in to the count of four, then exhale slowly.”

Once he had fixed his breathing, GPT confirmed his breathing was correct.

Mark then took the opportunity to explain the differences between the old voice mode and GPT-4o’s new audio capabilities.

“In the past, you used to have to wait for GPT to finish speaking before you could continue the conversation. Now you can interrupt it and butt in for quicker interactions. The model is also real-time responsive, meaning you don’t have the awkward 2-3 second lag whilst waiting for a response.”

The model can also pick up on emotion. As demonstrated with the breathing exercise, GPT could tell that Mark was breathing too quickly and needed to calm down.

Not only does GPT-4o recognise tone, it can also generate voice in a variety of different emotive styles with a wide dynamic range.

In a second demo, Mark asked GPT for a bedtime story. After the first few lines, he requested more emotion and drama. Immediately, the chatbot became more dramatic as instructed.

Mira Murati then tasked the model with continuing the story in a robotic voice. GPT obliged, implementing the robotic voice, while keeping the dramatic tone- which greatly impressed the live audience.

Finally, they both asked the model to finish the story in a singing voice, which it did perfectly.

GPT-4o Vision Capabilities

Next up, the team demonstrated GPT-4o’s vision capabilities, which allow users to upload photos and documents containing text and image, as well as real-time video to start conversations about the content.

Barret Zoph asked the model for help solving a linear equation. He asked not to be given the solution, but to be given hints along the way.

Zoph then wrote on a piece of paper ‘3x+1+4’ and showed this to GPT through his camera.

The model immediately recognised the equation and gave helpful tips along the way, in the way a teacher would, to help Zoph reach the answer. Not only did it offer hints, it also helped by asking questions, such as “ok, so what do we get when we subtract 1 from both sides”

After successfully completing the equation, Zoph wrote ‘I heart ChatGPT” Without hesitation, ChatGPT responded with “Aww! I see you wrote I love ChatGPT! That’s so sweet of you!”

GPT-4o Text Capabilities

Next, Zoph demonstrated the use of the desktop app to solve more complex problems. On his laptop screen was some code, and the ChatGPT voice app on the side.

At this point, the model couldn’t see anything on the screen and could only go off voice cues from Zoph. To begin with, he pasted the code to ChatGPT and asked it for a brief description of what the code does.

“This code fetches daily weather data for a specific location and time period, smooths the temperature data using a rolling average, annotates a significant weather event on the resulting plot and then displays the plot with the average, minimum and maximum temperatures over the year” ChatGPT answered.

He then asked about a specific function within the code, asking what the plot would look like if he had or didn’t have the function. GPT explained these correctly and clearly.

Finally, Zoph ran the code and shared his screen with GPT. He then asked for an overview of what it could see and even tested it on specific data on the graph, which it answered both immediately and perfectly.

Other GPT-4o Capabilities

GPT-4o now offers Memory, which aims to improve continuity across conversations, making ChatGPT more useful and helpful for long-term projects.

In the past, ChatGPT has only been able to provide information from sources prior to 2021. Now, you’ll also be able to use ‘Browse’ to search for real time information within your conversation.

What’s more, with advanced data analysis, you can upload all sorts of data such as charts and have ChatGPt analyse it and provide answers to any questions or problems you may have.

OpenAI has also improved on the quality and speed in 50 different languages. They explained that this is important as it brings the experience to as many people as possible. As part of the live demo, Mira Murati and Mark Chen were able to have a conversation in Italian and English, with ChatGPT translating both sides in real time.

Broader Accessibility and Open AI’s Vision for the Future

OpenAI’s mission has always been clear: to make advanced AI tools universally accessible. The introduction of GPT-4o is a huge step towards this.

Today there are over 100 million people that use ChatGPT to create, work and learn. Until now, the more advanced tools have only been available to paid users. Now these tools are becoming available for everyone.

This includes features such as GPTs. Over 1 million users have created custom experiences with GPTs and shared them through the GPT store. By opening this up to all users, there is a much wider audience.

Despite all users gaining access to GPT-4o, paid users will still have 5x the capacity limit of free users. Paid GPT-4o will also:

Be 2x faster
Cost 50% less
Have 5x higher rate limits compared to GPT-4 Turbo

As OpenAI brings these technologies into the world, they explain that it has been difficult finding a way to do so effectively and safely. They acknowledge that dealing with real time audio and vision presents new challenges, which is why the team has been hard at work building mitigations against misuse.

They are also continuing to work with the government, media, entertainment and civil society to figure out how best to do so. That’s why GPT-4o will be intuitively released over the coming weeks.

GPT-4o FAQs

What is GPT-4o?

GPT-4o is the latest flagship model released by OpenAI, featuring improvements in speed, intelligence, and multi-modal capabilities. It brings GPT-4 level intelligence to a broader audience, including free users, and integrates seamlessly across text, vision, and audio inputs.

Who can use GPT-4o?

GPT-4o is designed to be accessible to everyone. OpenAI has made this model available to both free and paid users, ensuring that advanced AI tools are accessible to a wide range of users, from individuals to large organisations.

When will GPT-4o be rolled out?

Chief Technology Officer Mira Murati said GPT-4o will be rolled out intuitively over the next few weeks as they continue to work towards safe and secure real-time audio and visual capabiltities.

When will the ChatGPT Desktop App be available?

OpenAi have launched the ChatGPT Desktop App today, Monday 13 May 2024.

What are the new features of GPT-4o?

GPT-4o introduces several new features including increased processing speed, the ability to handle complex interactions across different modalities (text, vision, audio), and reduced latency in responses. It also features a refreshed desktop app and web UI to enhance user experience.

How does GPT-4o improve user interaction?

GPT-4o improves user interaction by providing more natural and intuitive communication. It reduces the latency typically associated with AI responses and can understand and process a mixture of voice, text, and visual information in real-time.

Is GPT-4o safe to use?

OpenAI has implemented rigorous safety measures in GPT-4o, focusing on preventing misuse and ensuring ethical use of the technology. The organisation continues to collaborate with various stakeholders, including government and civil society, to enhance the safety features as the technology evolves.

How does GPT-4o support real-time translation?

GPT-4o can perform real-time translation between multiple languages, facilitating seamless communication in diverse linguistic contexts. This capability was demonstrated during the livestream, showing how the model could translate conversations between Italian and English instantly.

Can GPT-4o recognise emotions?

Yes, GPT-4o can recognise and respond to emotional cues in the user's voice. This was showcased in the demonstration where the AI provided real-time feedback to help calm a user’s nerves, indicating its ability to perceive and react to human emotions.

For more of the latest industry updates and insights, don't miss out on our monthly newsletter!

GPT-4o by OpenAI: Key Features, Updates & All You Need to Know

Find out all you need to know about OpenAi's latest announcement: GPT-4o, including features such as voice, vision and text capabilities and the all new ChatGPT Desktop App.

Quick Links

OpenAi’s Spring Update: Key Announcements

What is GPT-4o?

GPT-4o Voice Capabilities

GPT-4o Vision Capabilities

GPT-4o Text Capabilities

Other GPT-4o Capabilities

Broader Accessibility and Open AI’s Vision for the Future

GPT-4o FAQs

What is GPT-4o?

Who can use GPT-4o?

When will GPT-4o be rolled out?

When will the ChatGPT Desktop App be available?

What are the new features of GPT-4o?

How does GPT-4o improve user interaction?

Is GPT-4o safe to use?

How does GPT-4o support real-time translation?

Can GPT-4o recognise emotions?