This is a project under the AI Singapore 100 Experiments Programme. The project focuses on the healthcare industry resource management where there is a complex relationship not just among the various manpower types (doctors, nurses) but also with the patient lifecycle leadtimes, geo-location, medical equipment and facility needed to perform surgeries and patient care. Manpower shortage has birthed conservative and static long-term planning solutions without considering these upstream data flows. In post-covid world today, this project could bring more potential solutions to the manpower allocation and development problem, especially when demand changes acutely. The project sponsor, BIPO Service (Singapore) Pte Ltd believes that an AI-driven, short-input-to-output cycle HR system streaming in “demand”-pulled patient lifecycle data can allocate and inform skills development not only for full time, but part time workforce.
This research/project is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG2-100E-2023-118).
Most conversational systems today are not very good at adapting to new or unexpected situations when serving the end user in a dynamic environment. Models trained on fixed training datasets often fail easily in practical application scenarios. Existing methods for the fundamental task of conversation understanding rely heavily on training slot-filling models with a predefined ontology. For example, given an utterance such as “book a table for two persons in Blu Kouzina,” the models classify it into one of the predetermined intents book-table, predict specific values such as “two persons” and “Blu Kouzina” to fill predefined slots number_of_people and restaurant_name, respectively. The agent’s inherent conversation ontology comprises these intents, slots, and corresponding values. When end users say things outside of the predefined ontology, the agent tends to misunderstand the utterance and may cause critical errors. The aim of this project is to investigate how conversational agents can proactively detect new intents, values, and slots, and expand their conversation ontology on-the-fly to handle unseen situations better during deployment.
State-of-the-art visual perception models in autonomous vehicles (AV) fail in the physical world when meeting adversarially designed physical objects/environmental conditions. The main reason is that they are trained with discretely-sampled samples and can hardly cover all possibilities in the real world. Although effective, existing physical attacks consider one or two physical factors and cannot simulate dynamic entities (e.g., moving cars or persons, street structures) and environment factors (e.g., weather variation and light variation) jointly. Meanwhile, most defence methods like denoising or adversarial training (AT) mainly rely on single-view or single-modal information, neglecting the multi-view cameras and different modality sensors on the AV, which contain rich complementary information. The above challenges in both attacks and defenses are caused by the lack of a continuous and unified scene representation for the AV scenarios. Motivated by the above limitations, this project firstly aims to develop a unified AV scene representation based on the neural implicit representation to generate realistic new scenes. With this representation, we will develop extensive physical attacks, multi-view & multi-modal defenses, as well as a more complete evaluation framework. Specifically, the project will build a unified physical attack framework against AV perception models, which can adversarially optimize the physical-related parameters and generate more threatening examples that could happen in the real world. Furthermore, the project will build the multi-view and multi-modal defensive methods including a data reconstruction framework to reconstruct clean inputs and a novel ‘adversarial training’ method, i.e., adversarial repairing that enhances the robustness of the deep models with guidance of collected adversarial scenes. Finally, a robust-oriented explainable method will be developed to understand the behaviors of visual perception models under physical adversarial attacks and robustness enhancement.
This project will pioneer approaches that realize trusted automation bots that act as concierges and interactive advisors to software engineers to improve their productivity as well as software quality. TrustedSEERs will realize such automation by effectively learning from domain-specific, loosely-linked, multi-modal, multi-source and evolving software artefacts (e.g., source code, version history, bug reports, blogs, documentation, Q&A posts, videos, etc.). These artefacts can come from the organization deploying the automation bots, a group of collaborating yet privacy-aware organizations, and from freely available yet possibly licensed (e.g., GPL v2, GPL v3, MIT, etc.) data contributed by many, including untrusted entities, on the internet. TrustedSEERs will bring about the next generation of Software Analytics (SA) – a rapidly growing research area in the Software Engineering research field that turns data into automation – by establishing two initiatives: First, data-centric SA, through the design and development of methods that can systematically engineer (link, select, transform, synthesize, and label) data needed to learn more effective SA bots from diverse software artefacts, many of which are domain-specific and unique. Second, trustworthy SA, through the design and development of mechanisms that can engender software engineers’ trust in SA bots considering both intrinsic factors (explainability) and extrinsic ones (compliance to privacy and copyright laws and robustness to external attacks). In addition, TrustedSEERs will apply its core technologies to synergistic applications to improve engineer productivity and software security.
Consumers have widely used conversational AI systems such as Siri, Google Assistant and now ChatGPT. The next generation of conversational AI systems will have visual understanding capabilities to communicate with users through language and visual data. A core technology that enables such multimodal, human-like AI systems is visual question answering and the ability to answer questions based on information found in images and videos. This project focuses on visual question answering and aims to develop new visual question-answering technologies based on large-scale pre-trained vision-language models. Pre-training models developed by tech giants, particularly OpenAI, have made headlines in recent years, e.g., ChatGPT, which can converse with users in human language, and DALL-E 2, which can generate realistic images. This project aims to study how to best utilise large-scale pre-trained vision-language models for visual question answering. The project will systematically analyse these pre-trained models in terms of their capabilities and limitations in visual question answering and design technical solutions to bridge the gap between what pre-trained models can accomplish and what visual question answering systems require. The end of the project will be a new framework for building visual question-answering systems based on existing pre-trained models with minimal additional training.
Data visualisations have been widely used on mobile devices (e.g., smartphones), but they suffer from mobile-friendly issues in terms of their creation and usage. This project aims to develop novel techniques to achieve mobile-friendly data visualisations, including desirable mobile data visualisation creation and effective multimodal interaction design. The research outputs of this project will significantly improve the effectiveness and usability of mobile data visualisations and further promote their applications.
This project aims to improve the scalability of food recognition – to train classifier(s) that recognise a wide range of dishes regardless of cuisines, the amount and type of training examples. Here, “classifier” can be viewed as a “search engine” that retrieves the recipe of a food image. Training such classifiers requires an excessive number of training examples composed of recipes and images, where each recipe is paired with at least an image as visual reference. Training classifiers using paired or parallel data faces several practical limitations – tens of thousands of recipe-image pairs are required for training; other forms of data that are largely available in the public cannot be leveraged for model training; and additional training data is required when the recipes are written in different natural languages. Through the project, these practical limitations will be addressed from the perspective of transfer learning. The aim is to train a generalised classifier that is more adaptable for recognition, by removing the statistical bias, considering the evolving process, and aligning the semantics of different languages in machine learning.
This project aims to provide a solid foundation for analysing AI systems as well as techniques used to facilitate the development of reliable secure AI systems. Central to the research is to develop an executable specification in the form of an abstract logical representation of all components that are used to build artificial intelligence, which subsequentially enables powerful techniques to address three problems commonly encountered in AI systems, namely, how to ensure the quality or correctness of AI libraries, how to systematically locate bugs in neural network programs, and how to fix the bug. In other words, this project aims to define a semantics of AI models, thereby forming a solid fundamental to build AI systems upon.
Text style transfer (TST) is the task of converting a piece of text written in one style (e.g., informal text) into text written in a different style (e.g., formal text). It has applications in many scenarios such as AI-based writing assistance and removal of offensive language in social media posts. Recent years, with the advances of pre-trained large-scale language models such as the Generative Pre-trained Transformer 3 (GPT-3) which is an autoregressive language model that uses deep learning to produce human-like text, solutions to TST are now shifting to fine-tuning-based and prompt-based approaches. In this project, we will study how to effectively utilize pre-trained language models for TST under low-resource settings. We will also design ways to measure whether solutions based on pre-trained language models can disentangle content and style.
This project aims for learning efficient semantic segmentation models without using expensive annotations. Specifically, we leverage the most economical image-level labels to generate pseudo masks to facilitate the training of segmentation models. In the end, we will apply the resultant algorithms on tackling the remote sensing image segmentation in the challenging Continual, Few-shot, and Open-set Datasets.