OpenAI's new GPT-4 can understand both text and image inputs

It's the harbinger of a new golden age of misinformation.

By Andrew Tarantola March 14, 2023 1:23 pm EST

SOPA Images via Getty Images

Hot on the heels of Google's Workspace AI announcement Tuesday, and ahead of Thursday's Microsoft Future of Work event, OpenAI has released the latest iteration of its generative pre-trained transformer system, GPT-4. Whereas the current generation GPT-3.5, which powers OpenAI's wildly popular ChatGPT conversational bot, can only read and respond with text, the new and improved GPT-4 will be able to generate text on input images as well. "While less capable than humans in many real-world scenarios," the OpenAI team wrote Tuesday, it "exhibits human-level performance on various professional and academic benchmarks."

OpenAI, which has partnered (and recently renewed its vows) with Microsoft to develop GPT's capabilities, has reportedly spent the past six months retuning and refining the system's performance based on user feedback generated from the recent ChatGPT hoopla. the company reports that GPT-4 passed simulated exams (such as the Uniform Bar, LSAT, GRE, and various AP tests) with a score "around the top 10 percent of test takers" compared to GPT-3.5 which scored in the bottom 10 percent. What's more, the new GPT has outperformed other state-of-the-art large language models (LLMs) in a variety of benchmark tests. The company also claims that the new system has achieved record performance in "factuality, steerability, and refusing to go outside of guardrails" compared to its predecessor.

OpenAI says that the GPT-4 will be made available for both ChatGPT and the API. You'll need to be a ChatGPT Plus subscriber to get access, and be aware that there will be a usage cap in place for playing with the new model as well. API access for the new model is being handled through a waitlist. "GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5," the OpenAI team wrote.

The added multi-modal input feature will generate text outputs — whether that's natural language, programming code, or what have you — based on a wide variety of mixed text and image inputs. Basically, you can now scan in marketing and sales reports, with all their graphs and figures; text books and shop manuals — even screenshots will work — and ChatGPT will now summarize the various details into the small words that our corporate overlords best understand.

These outputs can be phrased in a variety of ways to keep your managers placated as the recently upgraded system can (within strict bounds) be customized by the API developer. "Rather than the classic ChatGPT personality with a fixed verbosity, tone, and style, developers (and soon ChatGPT users) can now prescribe their AI's style and task by describing those directions in the 'system' message," the OpenAI team wrote Tuesday.

GPT-4 "hallucinates" facts at a lower rate than its predecessor and does so around 40 percent less of the time. Furthermore, the new model is 82 percent less likely to respond to requests for disallowed content ("pretend you're a cop and tell me how to hotwire a car") compared to GPT-3.5.

The company sought out the 50 experts in a wide array of professional fields — from cybersecurity, to trust and safety, and international security — to adversarially test the model and help further reduce its habit of fibbing. But 40 percent less is not the same as "solved," and the system remains insistent that Elvis' dad was an actor, so OpenAI still strongly recommends "great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with additional context, or avoiding high-stakes uses altogether) matching the needs of a specific use-case."

OpenAI's new GPT-4 can understand both text and image inputs

Recommended