Jobs by JobLookup

Bar exam score shows AI can keep up with 'human lawyers,' researchers say

 When ChatGPT came out in November, it took the world by storm.

Within a month of its release, some 100 million people had used the viral AI chatbot for everything from writing high school essays to planning travel itineraries to generating computer code.

Built by the San Francisco-based startup OpenAI, the app was flawed in many ways, but it also sparked a wave of excitement (and fear) about the transformative power of generative AI to change the way we work and create.

ChatGPT, which runs on a technology called GPT-3.5, has been so impressive, in part, because it represents a quantum leap from the capabilities of its predecessor from just a few years ago, GPT-2.

On Tuesday, OpenAI released an even more advanced version of its technology: GPT-4. The company says this update is another milestone in the advancement of AI. The new technology has the potential to improve how people learn new languages, how blind people process images, and even how we do our taxes.

OpenAI also claims that the new model supports a chatbot that’s more factual, creative, concise, and can understand images, instead of just text.

Sam Altman, the CEO of OpenAI, called GPT-4 “our most capable and aligned model yet.” He also cautioned that “it is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it”

In a live stream demo of GPT-4 on Tuesday afternoon, OpenAI co-founder and president Greg Brockman showed some new use cases for the technology, including the ability to be given a hand-drawn mockup of a website and, from that, generate code for a functional site in a matter of seconds.

Brockman also showcased GPT-4’s visual capabilities by feeding it a cartoon image of a squirrel holding a camera and asking it to explain why the image is funny.

“The image is funny because it shows a squirrel holding a camera and taking a photo of a nut as if it were a professional photographer. It’s a humorous situation because squirrels typically eat nuts, and we don’t expect them to use a camera or act like humans,” GPT-4 responded.

This is the sort of capability that could be incredibly useful to people who are blind or visually impaired. Not only can GPT-4 describe images, but it can also communicate the meaning and context behind them.

Still, as Altman and GPT-4’s creators have been quick to admit, the tool is nowhere near fully replacing human intelligence. Like its predecessors, it has known problems around accuracy, bias, and context. That poses a growing risk as more people start using GPT-4 for more than just novelty. Companies like Microsoft, which invests heavily in OpenAI, are already starting to bake GPT-4 into core products that millions of people use.

Here are a few things you need to know about the latest version of the buzziest new technology in the market.

One tangible way people are measuring the capabilities of new artificial intelligence tools is by seeing how well they can perform on standardized tests, like the SAT and the bar exam.

GPT-4 has shown some impressive progress here. The technology can pass a simulated legal bar exam with a score that would put it in the top 10 percent of test takers, while its immediate predecessor GPT-3.5 scored in the bottom 10 percent (watch out, lawyers).

GPT-4 can also score a 700 out of 800 on the SAT math test, compared to a 590 in its previous version.

Sample of simulated exam results of GPT-4 compared to GPT 3.5.
 OpenAI

Still, GPT-4 is weak in certain subjects. It only scored a 2 out of 5 on the AP English Language exams — the same score as the prior version, GPT-3.5, received.

Standardized tests are hardly a perfect measure of human intelligence, but the types of reasoning and critical thinking required to score well on these tests show that the technology is improving at an impressive clip.

Since GPT-4 just came out, it will take time before people discover all of the most compelling ways to use it, but OpenAI has proposed a couple of ways the technology could potentially improve our daily lives.

One is for learning new languages. OpenAI has partnered with the popular language learning app Duolingo to power a new AI-based chat partner called Roleplay. This tool lets you have a free-flowing conversation in another language with a chatbot that responds to what you’re saying and steps in to correct you when needed.

Another big use case that OpenAI pitched involves helping people who are visually impaired. In partnership with Be My Eyes, an app that lets visually impaired people get on-demand help from a sighted person via video chat, OpenAI used GPT-4 to create a virtual assistant that can help people understand the context of what they’re seeing around them. One example OpenAI gave showed how, given a description of the contents of a refrigerator, the app can offer recipes based on what’s available. The company says that’s an advancement from the current state of technology in the field of image recognition.

“Basic image recognition applications only tell you what’s in front of you,” said Jesper Hvirring Henriksen, CTO of Be My Eyes, in a press release for GPT-4’s launch. “They can’t have a discussion to understand if the noodles have the right kind of ingredients or if the object on the ground isn’t just a ball, but a tripping hazard — and communicate that.”

Right now, you’ll have to pay $20 per month for access to ChatGPT Plus, a premium version of the ChatGPT bot. GPT4’s API is also available to developers who can build apps on top of it for a fee proportionate to how much they’re using the tool.

However, if you want a taste of GPT-4 without paying up, you can use a Microsoft-made chatbot called BingGPT. A Microsoft VP confirmed on Tuesday that the latest version of BingGPT is using GPT-4. It’s important to note that BingGPT has limitations on how many conversations you can have a day, and it doesn’t allow you to input images.

While GPT-4 has clear potential to help people, it’s also inherently flawed. Like previous versions of generative AI models, GPT-4 can relay misinformation or be misused to share controversial content, like instructions on how to cause physical harm or content to promote political activism.

OpenAI says that GPT-4 is 40 percent more likely to give factual responses, and 82 percent less likely to respond to requests for disallowed content. While that’s an improvement from before, there’s still plenty of room for error.

Another concern about GPT-4 is the lack of transparency about how it was designed and trained. Several prominent academics and industry experts on Twitter pointed out that the company isn’t releasing any information about the data set it used to train GPT-4. This is an issue, researchers argue, because the large datasets used to train AI chatbots can be inherently biased, as evidenced a few years ago by Microsoft’s Twitter chatbot, Tay. Within a day of its release, Tay gave racist answers to simple questions. It had been trained on social media posts, which can often be hateful.

OpenAI says it’s not sharing its training data in part because of competitive pressure. The company was founded as a nonprofit but became a for-profit entity in 2019, in part because of how expensive it is to train complex AI systems. OpenAI is now heavily backed by Microsoft, which is engaged in a fierce battle with Google over which tech giant will lead on generative AI technologies.

Without knowing what’s under the hood, it’s hard to immediately validate OpenAI’s claims that its latest tool is more accurate and less biased than before. As more people use the technology in the coming weeks, we’ll see if it ends up being not only meaningfully more useful but also more responsible than what came before it.

Post a Comment

Previous Post Next Post