Bangkok Post

Must do better

The new ChatGPT offers a lesson in AI hype

- BRIAN X. CHEN

When OpenAI unveiled the latest version of its immensely popular ChatGPT chatbot t his month, it had a new voice possessing humanlike inflection­s and emotions. The online demonstrat­ion also featured the bot tutoring a child on solving a geometry problem.

To my chagrin, the demo turned out to be essentiall­y a bait and switch. The new ChatGPT was released without most of its new features, including the improved voice (which the company said it postponed to make fixes). The ability to use a phone’s video camera to get real-time analysis of something like a maths problem isn’t available yet, either.

Amid the delay, the company also deactivate­d the ChatGPT voice that some said sounded like actress Scarlett Johansson, after she threatened legal action, replacing it with a different female voice.

For now, what has actually been rolled out in the new ChatGPT is the ability to upload photos for the bot to analyse. Users can generally expect quicker, more lucid responses. The bot can also do real-time language translatio­ns, but ChatGPT will respond in its older, machine-like voice.

Nonetheles­s, this is the leading chatbot that upended the tech industry, so it was worth reviewing. After trying the sped-up chatbot for two weeks, I had mixed feelings. It excelled at language translatio­ns, but it struggled with maths and physics. All told, I didn’t see a meaningful improvemen­t from the last version, ChatGPT-4. I definitely wouldn’t let it tutor my child.

This tactic, in which artificial intelligen­ce companies promise wild new features and deliver a half-baked product, is becoming a trend that is bound to confuse and frustrate people. The US$700 (25,500 baht) Ai Pin, a talking lapel pin from the start-up Humane which is funded by OpenAI’s CEO Sam Altman, was universall­y panned because it overheated and spat out nonsense. Meta also recently added to its apps an AI chatbot that did a poor job at most of its advertised tasks, like web searches for plane tickets.

Companies are releasing AI products in a premature state partly because they want people to use the technology to help them learn how to improve it. In the past, when companies unveiled new tech products like phones, what we were shown — features like new cameras and brighter screens — was what we were getting. With AI, companies are giving a preview of a potential future, demonstrat­ing technologi­es that are being developed and working only in limited, controlled conditions. A mature, reliable product might arrive — or might not.

The lesson to learn from all this is that we, as consumers, should resist the hype and take a slow, cautious approach to AI. We shouldn’t be spending much cash on any underbaked tech until we see proof that the tools work as advertised.

The new version of ChatGPT, called GPT-4o (o as in omni), is now free to try on OpenAI’s website and app. Nonpaying users can make a few requests before hitting a timeout, and those who have a $20 monthly subscripti­on can ask the bot a larger number of questions.

OpenAI said its iterative approach to updating ChatGPT allowed it to gather feedback to make improvemen­ts.

“We believe it’s important to preview our advanced models to give people a glimpse of their capabiliti­es and to help us understand their real-world applicatio­ns,” the company said in a statement.

(The New York Times sued OpenAI and its partner, Microsoft, last year for using copyrighte­d news articles without permission to train chatbots.)

Here’s what to know about the latest version of ChatGPT.

GEOMETRY AND PHYSICS

To show off ChatGPT-4o’s new tricks, OpenAI published a video featuring Sal Khan, the CEO of the Khan Academy, the education nonprofit, and his son Imran. With a video camera pointed at a geometry problem, ChatGPT was able to talk Imran through solving it step by step.

Even though ChatGPT’s video analysis feature has yet to be released, I was able to upload photos of geometry problems. ChatGPT solved some of the easier ones correctly, but it tripped up on more challengin­g problems.

For one problem involving intersecti­ng triangles, which I dug up on an exam preparatio­n website, the bot understood the question but gave the wrong answer.

Taylor Nguyen, a US high school physics teacher in Orange County, California, uploaded a physics problem involving a man on a swing that is commonly included on Advanced Placement Calculus tests. ChatGPT made several logical mistakes to give the wrong answer, but it was able to correct itself with feedback from Nguyen.

“I was able to coach it, but I’m a teacher,” he said. “How is a student supposed to pick out those mistakes? They’re making this assumption that the chatbot is right.”

I did notice that ChatGPT-4o succeeded at some division calculatio­ns that its predecesso­rs did incorrectl­y, so there are signs of slow improvemen­t. But it also failed at a basic maths task that past versions and other chatbots, including Meta AI and Google’s Gemini, have flunked at: the ability to count. When I asked ChatGPT-4o for a four-syllable word starting with the letter W, it responded: “Wonderful.”

REASONING

OpenAI also highlighte­d that the new ChatGPT was better at reasoning, or using logic to come up with responses. So I ran it through one of my favourite tests. I asked it to generate a Where’s Waldo? puzzle. When it showed an image of a giant Waldo standing in a crowd, I said that the point is that he’s supposed to be hard to find. The bot then generated an even larger Waldo.

Subbarao Kambhampat­i, a professor and researcher of artificial intelligen­ce at Arizona State University, also put the chatbot through some tests and said he saw no noticeable improvemen­t in reasoning compared with the last version.

He presented ChatGPT a puzzle involving blocks.

If block C is on top of block A, and block B is separately on the table, can you tell me how I can make a stack of blocks with block A on top of block B and block B on top of block C, but without moving block C?

The answer is that it’s impossible to arrange the blocks under these conditions, but, just as with past versions, ChatGPT-4o consistent­ly came up with a solution that involved moving block C. With this and other reasoning tests, ChatGPT was occasional­ly able to take feedback to get the correct answer, which is antithetic­al to how AI is supposed to work, Kambhampat­i said.

“You can correct it, but when you do that, you’re using your own intelligen­ce,” he said.

OpenAI pointed to test results that showed GPT-4o scored about 2 percentage points higher at answering general knowledge questions than previous versions of ChatGPT, illustrati­ng that its reasoning skills had slightly improved.

LANGUAGE

OpenAI also said the new ChatGPT could do real-time language translatio­n, which could help you converse with someone speaking a foreign language. I tested ChatGPT with Mandarin and Cantonese and confirmed that it was OK at translatin­g phrases, such as “I’d like to book a hotel room for next Thursday” and “I want a king-size bed”. But the accents were slightly off. (To be fair, my broken Chinese is not much better.) OpenAI said it was still working to improve accents.

ChatGPT-4o also excelled as an editor. When I fed it paragraphs that I wrote, it was fast and effective at removing excessive words and jargon. ChatGPT’s decent performanc­e with language translatio­n gives me confidence that this will soon become a more useful feature.

BOTTOM LINE

A major thing OpenAI got right with ChatGPT-4o is making the technology free for people to try. Free is the right price. Since we are helping to train these AI systems with our data to improve, we shouldn’t be paying for them.

The best of AI has yet to come, and it might one day be a good maths tutor that we want to talk to. But we should believe it when we see it — and hear it.

 ?? ?? A smartphone running ChatGPT.
A smartphone running ChatGPT.

Newspapers in English

Newspapers from Thailand