showSidebars ==
showTitleBreadcrumbs == 1
node.field_disable_title_breadcrumbs.value ==

Designing robust computer vision systems for autonomous vehicles

By Jovina Ang

SMU Office of Research – Even though the history of autonomous vehicles (AVs) can be traced back to the 1990s, most of the research on AVs have been conducted using test data in the laboratories.

The Robust AI Grand Challenge, organised in collaboration with the Future Systems and Technology Directorate, MINDEF Singapore, and DSO National Laboratories, aims to get researchers and scholars from Institutes of Higher Learning (IHLs) and Research Institutes (RIs) to develop innovative solutions to overcome the vulnerabilities of Artificial Intelligence (AI) models in Computer Vision (CV) systems for AVs.

Three teams have been selected to compete in the grand challenge – two teams from the Nanyang Technological University (NTU) and one team from the National University of Singapore (NUS). SMU Assistant Professor of Computer Science and Lee Kong Chian Fellow Xie Xiaofei will be competing as part of one of the teams from NTU.

When asked what motivated him to compete in the challenge, Professor Xie replied: “When I was at NTU, I had the opportunity to collaborate with Professor Liu Yang and the other co-principal investigators on multiple research projects. Also, the fields of trustworthy software and AI are aligned with my research interests.”

He added: “Another motivational factor for me is the opportunity to build robust AI models and test them in the physical world.

“The accuracy of CV systems has been shown to be compromised following any physical threats or attacks. I believe this is the reason why the Grand Challenge has been designed to challenge researchers to design CV systems for AVs that can recover to at least 80 percent of their original accuracy following the onset of any physical threats or attacks; for example, the sudden swerve of another car towards the AV. While the accuracy of 80 percent has been obtained on certain test benchmarks, to date, this threshold has not been attained with real-world data.”

He continued: “Other than meeting this threshold accuracy metric in the challenge, the teams are required to delve into three specific CV scenarios related to Autonomous Driving systems including their ability to detect objects, provide acceptable estimates of the depth and distance of objects to the AV, as well as categorise each pixel of an image for easy and accurate identification.”

The research

Other than Professor Xie, there are six other researchers on this NTU team. They include Professor Liu Yang, Dr. Guo Qing, Assistant Professor Zhang Tianwei, Assistant Professor Chen Lyu, Assistant Professor Zhang Hanwang and Professor Dong Jin Song.

To secure a place in the grand challenge, the team has designed four work packages as part of its comprehensive research project, which has just commenced on July 1, 2023, and will take place over three years. The first two years of the project will be spent on developing the optimal CV technology while the third year will be focused on testing the technology in the field.

The four work packages

Work Package 1 is designed to provide a unified and comprehensive AV visual representation.

Currently, most of the research on AV representation is centred on uni-directional sensing and object detection in a static or specific situation. Existing AV representations often employ separate models to process and recognise different types of data such as images, LiDAR (which is an acronym for Light Detection and Ranging) signals that leverage a combination of three-dimensional (3D) and laser scanning, or other sensing visual modalities.

While current approaches to AV visual representation work well for the data types that they are designed for, there is a lack of integration of the different data types. This can potentially hamper the overall performance and efficiency of the AV system. For instance, despite having defence mechanisms that are optimised for mitigating attacks targeted at image data, the AV could be susceptible to attacks targeting LiDAR signals.

This is why this work package is centred on creating a unified multi-view and multi-modal representation that can not only incorporate the detection and consideration of rich information, but also use inputs from different camera lenses, images, and LiDAR signals. In so doing, the research team intends to build robust models that can match the constantly changing scenarios, or simulate threats/attacks that occur in the physical world.

Given the complexity and costs associated with conducting experiments in the physical world, Work Package 2 is centred on synthesising real work threats between the digital world and the physical world.

By utilising the unified representation described in Work Package 1, the research team aims to iterate and evaluate the robustness of the CV models more effectively and efficiently before they are deployed to physical AV systems.

In addition, with the unified representation, the research team will be able to simulate changes in weather conditions, modify objects in the scene, or even incorporate other real-world variations such as someone running in front of the AV.

The intent behind Work Package 3 is to integrate the multiple views and modalities by reconstructing the different scenarios to improve the AV's ability to make robust and correct decisions when it is subjected to threats or attacks.

The research team will also delve into the concept of adversarial repairing techniques to enhance the resilience of the AV.

Work Package 4 will deep dive into understanding the logic of AI decision-making process.

To do this, the research team will first analyse the behaviours of neurons and the respective weightage associated with model predictions – to gain insights into the decision logic employed by the AI system.

The researchers will also explore the important features that contribute to the model's predictions, specifically why certain data points are classified as precursors to attacks or threats. This exploration will enable the researchers to understand how the attacks exploit the vulnerabilities of AI decision-making processes.

Once a thorough understanding of the logic of AI decision-making process is obtained, the researchers will move on to enhance the models' robustness against such attacks by adjusting the influence of neurons and/or predicting weightage to build more resilient systems.   

The outcome from completing the work packages will be a robust technology where the judges will determine which of the three teams would advance to the final stage of the challenge.

Datasets

Such a comprehensive research project would require a large dataset. The dataset will be drawn from four sources including:

  1. Publicly available opensource dataset including KITTI, MS-COCO, Cityscapes, Pascal VOC
  2. Data collected from simulators in the laboratories
  3. Physical and real sample data provided by AISG and DSO National Laboratories
  4. Data obtained from Desay SV Automotive and CETRAN – which are the research team’s industry collaborators

What success is expected to look like

When asked what success would look like, Professor Xie answered, “The success of our research will be determined by achieving three key accomplishments.

“The first accomplishment will be based on the development of advanced attack and defense techniques that can be used to effectively enhance the robustness of CV systems in both digital and physical worlds. In so doing, we hope to significantly improve the resilience and reliability of CV systems in the face of potential threats. We also want to meet or exceed the threshold of 80% accuracy recovery of the CV systems following physical threats or attacks.”

 He concluded thus: “That said, our ultimate goal is to be one of the driving forces of innovation for the AV and automotive industry. Other than having Desay SV Automotive and CETRAN adopt our technology, it is our aspiration to have other leading AV companies and centres to follow suit as well.”

Back to Research@SMU August 2023 Issue