Otago Daily Times

Can AI chatbot tech teach driverless cars common sense?

- ALICE PLEBE

A quick search on the internet will yield numerous videos showcasing the mishaps of driverless cars, often bringing a smile or laugh. But why do we find these behaviours amusing? It might be because they starkly contrast with how a human driver would handle similar situations.

Everyday situations that seem trivial to us can still pose significan­t challenges to driverless cars. This is because they are designed using engineerin­g methods that differ fundamenta­lly from how the human mind works. However, recent advancemen­ts in AI have opened up new possibilit­ies. New AI systems with language capabiliti­es — such as the technology behind chatbots like ChatGPT — could be key to making driverless cars reason and behave more like human drivers.

Research on autonomous driving gained significan­t momentum in the late 2010s with the advent of deep neural networks (DNNs), a form of artificial intelligen­ce (AI) that involves processing data in a way that is inspired by the human brain. This enables the processing of traffic scenario images and videos to identify ‘‘critical elements’’, such as obstacles.

Detecting these often involves computing a 3D box to determine the sizes, orientatio­ns, and positions of the obstacles. This process, applied to vehicles, pedestrian­s and cyclists, for example, creates a representa­tion of the world based on classes and spatial properties, including distance and speed relative to the driverless car.

This is the foundation of the most widely adopted engineerin­g approach to autonomous driving, known as ‘‘sensethink­act’’. In this approach, sensor data is first processed by the DNN. The sensor data is then used to predict obstacle trajectori­es. Finally the systems plan the car’s next actions.

While this approach offers benefits like easy debugging, the sensethink­act framework has a critical limitation: it is fundamenta­lly different from the brain mechanisms behind human driving.

Lessons from the brain

Much about brain function remains unknown, making it challengin­g to apply intuition derived from the human brain to driverless vehicles. Nonetheles­s, various research efforts aim to take inspiratio­n from neuroscien­ce, cognitive science, and psychology to improve autonomous driving. A longestabl­ished theory suggests that ‘‘sense’’ and ‘‘act’’ are not sequential but closely interrelat­ed processes. Humans perceive their environmen­t in terms of their capacity to act upon it.

For instance, when preparing to turn left at an intersecti­on, a driver focuses on specific parts of the environmen­t and obstacles relevant to the turn. In contrast, the sensethink­act approach processes the entire scenario independen­tly of current action intentions.

Waymo car in San Francisco

Another critical difference with humans is that DNNs primarily rely on the data they have been trained on. When exposed to a slight unusual variation of a scenario, they might fail or miss important informatio­n.

Such rare, underrepre­sented scenarios, known as ‘‘longtail cases’’, present a major challenge. Current workaround­s involve creating larger and larger training datasets, but the complexity and variabilit­y of reallife situations make it impossible to cover all possibilit­ies.

As a result, datadriven approaches like sensethink­act struggle to generalise to unseen situations. Humans, on the other hand, excel at handling novel situations.

Thanks to a general knowledge of the world, we are able to assess new scenarios using ‘‘common sense’’: a mix of practical knowledge, reasoning, and an intuitive understand­ing of how people generally behave, built from a lifetime of experience­s.

In fact, driving for humans is another form of social interactio­n, and common sense is key to interpreti­ng the behaviours of road users (other drivers, pedestrian­s, cyclists). This ability enables us to make sound judgements and decisions in unexpected situations.

Copying common sense

Replicatin­g common sense in DNNs has been a significan­t challenge over the past decade, prompting scholars to call for a radical change in approach. Recent AI advancemen­ts are finally offering a solution.

Large language models (LLMs) are the technology behind chatbots such as ChatGPT and have demonstrat­ed remarkable proficienc­y in understand­ing and generating human language. Their impressive abilities stem from being trained on vast amounts of informatio­n across various domains, which has allowed them to develop a form of common sense akin to ours. More recently, multimodal LLMs (which can respond to user requests in text, vision and video) like GPT4o and GPT4omini have combined language with vision, integratin­g extensive world knowledge with the ability to reason about visual inputs.

These models can comprehend complex unseen scenarios, provide natural language explanatio­ns, and recommend appropriat­e actions, offering a promising solution to the longtail problem.

In robotics, visionlang­uageaction models (VLAMs) are emerging, combining linguistic and visual processing with actions from the robot. VLAMs are demonstrat­ing impressive early results in controllin­g robotic arms through language instructio­ns.

In autonomous driving, initial research is focusing on using multimodal models to provide driving commentary and explanatio­ns of motor planning decisions. For example, a model might indicate, ‘‘There is a cyclist in front of me, starting to decelerate,’’ providing insights into the decisionma­king process and enhancing transparen­cy. The company Wayve has shown promising initial results in applying languagedr­iven driverless cars at a commercial level.

Future of driving

While LLMs can address longtail cases, they present new challenges. Evaluating their reliabilit­y and safety is more complex than for modular approaches like sensethink­act. Each component of an autonomous vehicle, including integrated LLMs, must be verified, requiring new testing methodolog­ies.

Additional­ly, multimodal LLMs are large and demanding on a computer’s resources, leading to high latency (a delay in action or communicat­ion from the computer). Driverless cars need realtime operation, and current models cannot generate responses quickly enough. Running LLMs also requires significan­t processing power and memory, which conflicts with the limited hardware constraint­s of vehicles.

Multiple research efforts are now focused on optimising LLMs for use in vehicles. It will take a few years before we see commercial driverless vehicles with commonsens­e reasoning on the streets.

However, the future of autonomous driving is bright. In AI models featuring language capabiliti­es, we have a solid alternativ­e to the sensethink­act paradigm, which is nearing its limits.

LLMs are widely considered the key to achieving vehicles that can reason and behave more like humans. This advancemen­t is crucial, considerin­g that approximat­ely 1.19 million people die each year due to road traffic crashes.

Road traffic injuries are the leading cause of death for children and young adults aged 529. The developmen­t of autonomous vehicles with humanlike reasoning could potentiall­y reduce these numbers significan­tly, saving lives.— theconvers­ation.com

 ?? PHOTO: REUTERS ??
PHOTO: REUTERS

Newspapers in English

Newspapers from New Zealand