State-of-the-art visual perception models in autonomous vehicles (AV) fail in the physical world when meeting adversarially designed physical objects/environmental conditions. The main reason is that they are trained with discretely-sampled samples and can hardly cover all possibilities in the real world. Although effective, existing physical attacks consider one or two physical factors and cannot simulate dynamic entities (e.g., moving cars or persons, street structures) and environment factors (e.g., weather variation and light variation) jointly. Meanwhile, most defence methods like denoising or adversarial training (AT) mainly rely on single-view or single-modal information, neglecting the multi-view cameras and different modality sensors on the AV, which contain rich complementary information. The above challenges in both attacks and defenses are caused by the lack of a continuous and unified scene representation for the AV scenarios. Motivated by the above limitations, this project firstly aims to develop a unified AV scene representation based on the neural implicit representation to generate realistic new scenes. With this representation, we will develop extensive physical attacks, multi-view & multi-modal defenses, as well as a more complete evaluation framework. Specifically, the project will build a unified physical attack framework against AV perception models, which can adversarially optimize the physical-related parameters and generate more threatening examples that could happen in the real world. Furthermore, the project will build the multi-view and multi-modal defensive methods including a data reconstruction framework to reconstruct clean inputs and a novel ‘adversarial training’ method, i.e., adversarial repairing that enhances the robustness of the deep models with guidance of collected adversarial scenes. Finally, a robust-oriented explainable method will be developed to understand the behaviors of visual perception models under physical adversarial attacks and robustness enhancement.
This project will pioneer approaches that realize trusted automation bots that act as concierges and interactive advisors to software engineers to improve their productivity as well as software quality. TrustedSEERs will realize such automation by effectively learning from domain-specific, loosely-linked, multi-modal, multi-source and evolving software artefacts (e.g., source code, version history, bug reports, blogs, documentation, Q&A posts, videos, etc.). These artefacts can come from the organization deploying the automation bots, a group of collaborating yet privacy-aware organizations, and from freely available yet possibly licensed (e.g., GPL v2, GPL v3, MIT, etc.) data contributed by many, including untrusted entities, on the internet. TrustedSEERs will bring about the next generation of Software Analytics (SA) – a rapidly growing research area in the Software Engineering research field that turns data into automation – by establishing two initiatives: First, data-centric SA, through the design and development of methods that can systematically engineer (link, select, transform, synthesize, and label) data needed to learn more effective SA bots from diverse software artefacts, many of which are domain-specific and unique. Second, trustworthy SA, through the design and development of mechanisms that can engender software engineers’ trust in SA bots considering both intrinsic factors (explainability) and extrinsic ones (compliance to privacy and copyright laws and robustness to external attacks). In addition, TrustedSEERs will apply its core technologies to synergistic applications to improve engineer productivity and software security.
Consumers have widely used conversational AI systems such as Siri, Google Assistant and now ChatGPT. The next generation of conversational AI systems will have visual understanding capabilities to communicate with users through language and visual data. A core technology that enables such multimodal, human-like AI systems is visual question answering and the ability to answer questions based on information found in images and videos. This project focuses on visual question answering and aims to develop new visual question-answering technologies based on large-scale pre-trained vision-language models. Pre-training models developed by tech giants, particularly OpenAI, have made headlines in recent years, e.g., ChatGPT, which can converse with users in human language, and DALL-E 2, which can generate realistic images. This project aims to study how to best utilise large-scale pre-trained vision-language models for visual question answering. The project will systematically analyse these pre-trained models in terms of their capabilities and limitations in visual question answering and design technical solutions to bridge the gap between what pre-trained models can accomplish and what visual question answering systems require. The end of the project will be a new framework for building visual question-answering systems based on existing pre-trained models with minimal additional training.
Data visualisations have been widely used on mobile devices (e.g., smartphones), but they suffer from mobile-friendly issues in terms of their creation and usage. This project aims to develop novel techniques to achieve mobile-friendly data visualisations, including desirable mobile data visualisation creation and effective multimodal interaction design. The research outputs of this project will significantly improve the effectiveness and usability of mobile data visualisations and further promote their applications.
This project aims to improve the scalability of food recognition – to train classifier(s) that recognise a wide range of dishes regardless of cuisines, the amount and type of training examples. Here, “classifier” can be viewed as a “search engine” that retrieves the recipe of a food image. Training such classifiers requires an excessive number of training examples composed of recipes and images, where each recipe is paired with at least an image as visual reference. Training classifiers using paired or parallel data faces several practical limitations – tens of thousands of recipe-image pairs are required for training; other forms of data that are largely available in the public cannot be leveraged for model training; and additional training data is required when the recipes are written in different natural languages. Through the project, these practical limitations will be addressed from the perspective of transfer learning. The aim is to train a generalised classifier that is more adaptable for recognition, by removing the statistical bias, considering the evolving process, and aligning the semantics of different languages in machine learning.
This project aims to provide a solid foundation for analysing AI systems as well as techniques used to facilitate the development of reliable secure AI systems. Central to the research is to develop an executable specification in the form of an abstract logical representation of all components that are used to build artificial intelligence, which subsequentially enables powerful techniques to address three problems commonly encountered in AI systems, namely, how to ensure the quality or correctness of AI libraries, how to systematically locate bugs in neural network programs, and how to fix the bug. In other words, this project aims to define a semantics of AI models, thereby forming a solid fundamental to build AI systems upon.
Text style transfer (TST) is the task of converting a piece of text written in one style (e.g., informal text) into text written in a different style (e.g., formal text). It has applications in many scenarios such as AI-based writing assistance and removal of offensive language in social media posts. Recent years, with the advances of pre-trained large-scale language models such as the Generative Pre-trained Transformer 3 (GPT-3) which is an autoregressive language model that uses deep learning to produce human-like text, solutions to TST are now shifting to fine-tuning-based and prompt-based approaches. In this project, we will study how to effectively utilize pre-trained language models for TST under low-resource settings. We will also design ways to measure whether solutions based on pre-trained language models can disentangle content and style.
This project aims for learning efficient semantic segmentation models without using expensive annotations. Specifically, we leverage the most economical image-level labels to generate pseudo masks to facilitate the training of segmentation models. In the end, we will apply the resultant algorithms on tackling the remote sensing image segmentation in the challenging Continual, Few-shot, and Open-set Datasets.
This project aims to design a hierarchical cross-network multi-agent Reinforcement-Learning-based trading strategy generator and examines governance framework for crypto asset markets.
This proposal contributes to Thrust 3 of the National Quantum Computing Hub (NQCH) that is focused on translational R&D, such as the development of libraries, prebuild models, and templates to enable easier and faster programming and developments of software applications by early adopters in the industry, government agencies and Institutes of Higher Learning (IHLs). This project aims to develop hybrid quantum-classical algorithms and tools that will contribute to the libraries and pre-build models for supply chain use cases. Compared with classical techniques, we aim to enhance the performance of the Sample Average Approximation (SAA) and Simulation Optimization, that is verifiable in today’s NISQ quantum hardware, and apply these algorithms to supply chain risk management contexts. It is anticipated that these algorithms will achieve higher-quality and computationally attractive solutions over pure classical algorithms.