Consumers have widely used conversational AI systems such as Siri, Google Assistant and now ChatGPT. The next generation of conversational AI systems will have visual understanding capabilities to communicate with users through language and visual data. A core technology that enables such multimodal, human-like AI systems is visual question answering and the ability to answer questions based on information found in images and videos. This project focuses on visual question answering and aims to develop new visual question-answering technologies based on large-scale pre-trained vision-language models. Pre-training models developed by tech giants, particularly OpenAI, have made headlines in recent years, e.g., ChatGPT, which can converse with users in human language, and DALL-E 2, which can generate realistic images. This project aims to study how to best utilise large-scale pre-trained vision-language models for visual question answering. The project will systematically analyse these pre-trained models in terms of their capabilities and limitations in visual question answering and design technical solutions to bridge the gap between what pre-trained models can accomplish and what visual question answering systems require. The end of the project will be a new framework for building visual question-answering systems based on existing pre-trained models with minimal additional training.
Data visualisations have been widely used on mobile devices (e.g., smartphones), but they suffer from mobile-friendly issues in terms of their creation and usage. This project aims to develop novel techniques to achieve mobile-friendly data visualisations, including desirable mobile data visualisation creation and effective multimodal interaction design. The research outputs of this project will significantly improve the effectiveness and usability of mobile data visualisations and further promote their applications.
This project aims to improve the scalability of food recognition – to train classifier(s) that recognise a wide range of dishes regardless of cuisines, the amount and type of training examples. Here, “classifier” can be viewed as a “search engine” that retrieves the recipe of a food image. Training such classifiers requires an excessive number of training examples composed of recipes and images, where each recipe is paired with at least an image as visual reference. Training classifiers using paired or parallel data faces several practical limitations – tens of thousands of recipe-image pairs are required for training; other forms of data that are largely available in the public cannot be leveraged for model training; and additional training data is required when the recipes are written in different natural languages. Through the project, these practical limitations will be addressed from the perspective of transfer learning. The aim is to train a generalised classifier that is more adaptable for recognition, by removing the statistical bias, considering the evolving process, and aligning the semantics of different languages in machine learning.
This project aims to provide a solid foundation for analysing AI systems as well as techniques used to facilitate the development of reliable secure AI systems. Central to the research is to develop an executable specification in the form of an abstract logical representation of all components that are used to build artificial intelligence, which subsequentially enables powerful techniques to address three problems commonly encountered in AI systems, namely, how to ensure the quality or correctness of AI libraries, how to systematically locate bugs in neural network programs, and how to fix the bug. In other words, this project aims to define a semantics of AI models, thereby forming a solid fundamental to build AI systems upon.
Text style transfer (TST) is the task of converting a piece of text written in one style (e.g., informal text) into text written in a different style (e.g., formal text). It has applications in many scenarios such as AI-based writing assistance and removal of offensive language in social media posts. Recent years, with the advances of pre-trained large-scale language models such as the Generative Pre-trained Transformer 3 (GPT-3) which is an autoregressive language model that uses deep learning to produce human-like text, solutions to TST are now shifting to fine-tuning-based and prompt-based approaches. In this project, we will study how to effectively utilize pre-trained language models for TST under low-resource settings. We will also design ways to measure whether solutions based on pre-trained language models can disentangle content and style.
The governance of artificial intelligence (AI) to mitigate societal and individual harm through ethics-by-design calls for equal attention to responsible data use before public trust can be conferred to AI technologies. Since trust is fundamentally rooted in community relationships, AI regulators seeking public acceptance toward AI innovation must attend to community-centric pathways to integrate data subjects’ voices in AI ethical decision-making. While traditional actuarial methods in financial audits can indicate a diverse range of evidence used to determine legal compliance, the researchers suggest that community interests and data subjects’ voices should not be absent in AI audit models. This research proposal will explore Singaporean (and Asian) perspectives on AI regulation to inform the motivations for using AI audits to rebuild public trust. Research analysis on the proposed scope and methodologies of AI audits will be followed by recommendations on the relevant skillsets for future AI auditors.
This project aims for learning efficient semantic segmentation models without using expensive annotations. Specifically, we leverage the most economical image-level labels to generate pseudo masks to facilitate the training of segmentation models. In the end, we will apply the resultant algorithms on tackling the remote sensing image segmentation in the challenging Continual, Few-shot, and Open-set Datasets.
For the Singapore leader the final audience is always larger than the physical audience at a particular venue. The importance of leadership oratory is not confined to live co-present audiences, as wider audiences have long viewed political and organisational leaders’ speeches via television (and radio) and the use of various recording technologies (VHS, DVD). Recently, it has become common for speeches to be broadcast live on the internet and/or disseminated via online video. As a result, they can be viewed by potentially vast and diverse national and global audiences at different times, in a wide variety of contexts, using a range of devices (Wenzel and Koch, 2018; Rossette-Crake, 2020). According to Rossette-Crake (2020), since the turn of the century, it has become standard practice for speeches to be written and delivered with this in mind, and this is leading to changes that are akin to the way in which political oratory was transformed by radio and television during the 20th century (Greatbatch and Clark, 2005). Building on these points, this research project seeks to establish which oratorical practices are associated with positive persuasive outcomes and inspire trust and a sense of group cohesiveness amongst members of diverse audiences. It will answer two questions: (1) What are the verbal and non-verbal practices associated with establishing trust and a sense of group cohesiveness among members of diverse audiences during live speeches, and (2) How do the diverse audience members perceive the impact of these practices and whether the themes of the speeches also influence their perceptions?
This project aims to design a hierarchical cross-network multi-agent Reinforcement-Learning-based trading strategy generator and examines governance framework for crypto asset markets.
This proposal contributes to Thrust 3 of the National Quantum Computing Hub (NQCH) that is focused on translational R&D, such as the development of libraries, prebuild models, and templates to enable easier and faster programming and developments of software applications by early adopters in the industry, government agencies and Institutes of Higher Learning (IHLs). This project aims to develop hybrid quantum-classical algorithms and tools that will contribute to the libraries and pre-build models for supply chain use cases. Compared with classical techniques, we aim to enhance the performance of the Sample Average Approximation (SAA) and Simulation Optimization, that is verifiable in today’s NISQ quantum hardware, and apply these algorithms to supply chain risk management contexts. It is anticipated that these algorithms will achieve higher-quality and computationally attractive solutions over pure classical algorithms.