Can AI chatbot tech teach driverless cars common sense?
A quick search on the internet will yield numerous videos showcasing the mishaps of driverless cars, often bringing a smile or laugh. But why do we find these behaviours amusing? It might be because they starkly contrast with how a human driver would handle similar situations.
Everyday situations that seem trivial to us can still pose significant challenges to driverless cars. This is because they are designed using engineering methods that differ fundamentally from how the human mind works. However, recent advancements in AI have opened up new possibilities. New AI systems with language capabilities — such as the technology behind chatbots like ChatGPT — could be key to making driverless cars reason and behave more like human drivers.
Research on autonomous driving gained significant momentum in the late 2010s with the advent of deep neural networks (DNNs), a form of artificial intelligence (AI) that involves processing data in a way that is inspired by the human brain. This enables the processing of traffic scenario images and videos to identify ‘‘critical elements’’, such as obstacles.
Detecting these often involves computing a 3D box to determine the sizes, orientations, and positions of the obstacles. This process, applied to vehicles, pedestrians and cyclists, for example, creates a representation of the world based on classes and spatial properties, including distance and speed relative to the driverless car.
This is the foundation of the most widely adopted engineering approach to autonomous driving, known as ‘‘sensethinkact’’. In this approach, sensor data is first processed by the DNN. The sensor data is then used to predict obstacle trajectories. Finally the systems plan the car’s next actions.
While this approach offers benefits like easy debugging, the sensethinkact framework has a critical limitation: it is fundamentally different from the brain mechanisms behind human driving.
Lessons from the brain
Much about brain function remains unknown, making it challenging to apply intuition derived from the human brain to driverless vehicles. Nonetheless, various research efforts aim to take inspiration from neuroscience, cognitive science, and psychology to improve autonomous driving. A longestablished theory suggests that ‘‘sense’’ and ‘‘act’’ are not sequential but closely interrelated processes. Humans perceive their environment in terms of their capacity to act upon it.
For instance, when preparing to turn left at an intersection, a driver focuses on specific parts of the environment and obstacles relevant to the turn. In contrast, the sensethinkact approach processes the entire scenario independently of current action intentions.
Waymo car in San Francisco
Another critical difference with humans is that DNNs primarily rely on the data they have been trained on. When exposed to a slight unusual variation of a scenario, they might fail or miss important information.
Such rare, underrepresented scenarios, known as ‘‘longtail cases’’, present a major challenge. Current workarounds involve creating larger and larger training datasets, but the complexity and variability of reallife situations make it impossible to cover all possibilities.
As a result, datadriven approaches like sensethinkact struggle to generalise to unseen situations. Humans, on the other hand, excel at handling novel situations.
Thanks to a general knowledge of the world, we are able to assess new scenarios using ‘‘common sense’’: a mix of practical knowledge, reasoning, and an intuitive understanding of how people generally behave, built from a lifetime of experiences.
In fact, driving for humans is another form of social interaction, and common sense is key to interpreting the behaviours of road users (other drivers, pedestrians, cyclists). This ability enables us to make sound judgements and decisions in unexpected situations.
Copying common sense
Replicating common sense in DNNs has been a significant challenge over the past decade, prompting scholars to call for a radical change in approach. Recent AI advancements are finally offering a solution.
Large language models (LLMs) are the technology behind chatbots such as ChatGPT and have demonstrated remarkable proficiency in understanding and generating human language. Their impressive abilities stem from being trained on vast amounts of information across various domains, which has allowed them to develop a form of common sense akin to ours. More recently, multimodal LLMs (which can respond to user requests in text, vision and video) like GPT4o and GPT4omini have combined language with vision, integrating extensive world knowledge with the ability to reason about visual inputs.
These models can comprehend complex unseen scenarios, provide natural language explanations, and recommend appropriate actions, offering a promising solution to the longtail problem.
In robotics, visionlanguageaction models (VLAMs) are emerging, combining linguistic and visual processing with actions from the robot. VLAMs are demonstrating impressive early results in controlling robotic arms through language instructions.
In autonomous driving, initial research is focusing on using multimodal models to provide driving commentary and explanations of motor planning decisions. For example, a model might indicate, ‘‘There is a cyclist in front of me, starting to decelerate,’’ providing insights into the decisionmaking process and enhancing transparency. The company Wayve has shown promising initial results in applying languagedriven driverless cars at a commercial level.
Future of driving
While LLMs can address longtail cases, they present new challenges. Evaluating their reliability and safety is more complex than for modular approaches like sensethinkact. Each component of an autonomous vehicle, including integrated LLMs, must be verified, requiring new testing methodologies.
Additionally, multimodal LLMs are large and demanding on a computer’s resources, leading to high latency (a delay in action or communication from the computer). Driverless cars need realtime operation, and current models cannot generate responses quickly enough. Running LLMs also requires significant processing power and memory, which conflicts with the limited hardware constraints of vehicles.
Multiple research efforts are now focused on optimising LLMs for use in vehicles. It will take a few years before we see commercial driverless vehicles with commonsense reasoning on the streets.
However, the future of autonomous driving is bright. In AI models featuring language capabilities, we have a solid alternative to the sensethinkact paradigm, which is nearing its limits.
LLMs are widely considered the key to achieving vehicles that can reason and behave more like humans. This advancement is crucial, considering that approximately 1.19 million people die each year due to road traffic crashes.
Road traffic injuries are the leading cause of death for children and young adults aged 529. The development of autonomous vehicles with humanlike reasoning could potentially reduce these numbers significantly, saving lives.— theconversation.com