OpenAI released GPT-4o, their newest top-level generative AI model, on Monday. The “o” stands for “omni,” which means the model can handle text, speech, and video. GPT-4o will be added to the company’s developer and consumer goods in small steps over the next few weeks.
OpenAI CTO Mira Murati said that GPT-4o has “GPT-4-level” intelligence but is better than GPT-4 in many ways and with more types of media.
On Monday, Murati gave a live streamed talk at OpenAI’s offices in San Francisco. He said, “GPT-4o thinks across voice, text, and vision.” “And this is very important because we’re looking at how people will interact with machines in the future.”
OpenAI’s previous “leading” and “most advanced” model, GPT-4 Turbo, was trained on both images and text. It could look at both to do things like extracting text from images or even describing what was in those pictures. GPT-4o, on the other hand, adds speech.
What Does This Make Possible? A Lot Of Different Things
The experience in OpenAI’s AI-powered robot, ChatGPT, is a lot better with GPT-4o. The platform has had a voice mode for a while now, which uses a text-to-speech model to transcribe the chatbot’s answers. GPT-4o takes this a step further, making it more like talking to an assistant.
People can, for instance, ask the GPT-4o-powered ChatGPT a question and then stop it while it’s answering. OpenAI says the model responds in “real time” and can even pick up on subtleties in a user’s voice, creating voices in “a range of different emotive styles,” such as singing.
GPT-4o also improves ChatGPT’s ability to see. ChatGPT can now quickly answer questions like “What’s going on in this software code?” and “What brand of shirt is this person wearing?” if it is given a picture or a desktop screen.
Musarat says that these features will change even more in the future. Today, GPT-4o can look at a picture of a menu in a different language and translate it. In the future, the model could let ChatGPT do things like “watch” a live sports game and tell you how to play.
The models are getting more complicated, but Murati said, “We want the interaction to feel more natural and easy, and we don’t want you to think about the UI at all. We want you to just think about working together with ChatGPT.” “Over the past two years, we’ve put a lot of effort into making these models smarter…” But this is the first time we’re really making a big improvement in how easy it is to use.
OpenAI says that GPT-4o is also more multilingual and has better results in about 50 languages. The company says that GPT-4o is twice as fast as GPT-4 Turbo, costs half as much, and has higher rate limits in OpenAI’s API and Microsoft’s Azure OpenAI Service.
At the moment, not all users can use voice with the GPT-4o API. OpenAI says that because of the chance of abuse, it will first give “a small group of trusted partners” access to GPT-4o’s new audio features in the coming weeks.
Starting today, GPT-4o is available in the free version of ChatGPT. It is also available to people who pay for OpenAI’s paid ChatGPT Plus and Team plans, which have “5x higher” message limits. (OpenAI says that when users reach the rate limit, ChatGPT will automatically switch to GPT-3.5, which is an older and less powerful model.) The better ChatGPT voice experience based on GPT-4o will be available in alpha for Plus users in about a month, along with choices for businesses.
In related news, OpenAI said it will be releasing a new version of ChatGPT for macOS and the web with a “more conversational” home screen and message style. Users will also be able to ask questions using a keyboard shortcut or take and discuss screenshots. Starting today, people who have ChatGPT Plus will be able to use the app before anyone else. A Windows version will come out later this year.
Also, users of ChatGPT’s free tier can now access the GPT Store, which is OpenAI’s collection of third-party apps and the tools used to make them. Free users can also use ChatGPT features that used to be locked behind a paywall. For example, ChatGPT can “remember” your preferences for future interactions, let you share files and pictures, and search the web for answers to questions that come up at the right time.
Also Read: Openai Lets People See What Its Ai is Doing Behind the Scenes
We are starting an AI blog! If you sign up here, you’ll start getting it in your email on June 5.
What do you say about this story? Visit Parhlo World For more.