SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore
SGHateCheck is a pioneering framework designed to address the unique challenges of hate speech detection in Singapore's multilingual and culturally diverse landscape. By building on existing methodologies like HateCheck and MHC, SGHateCheck introduces functional tests for four key languages—Singlish, Mandarin, Malay, and Tamil. This project highlights the limitations of state-of-the-art models in accurately moderating content in Southeast Asian contexts, driving the development of more inclusive and effective hate speech detection systems. Explore how SGHateCheck is shaping the future of online trust and safety in low-resource language settings. This project is support by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 2.
20 June 2024
MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation
MemeCraft is an innovative project that harnesses the power of generative AI to create impactful memes for social good. Developed in SUTD, this tool uses advanced language and visual models to generate memes that support important social movements like climate action and gender equality. By ensuring that the generated content is both humorous and respectful, MemeCraft aims to engage online audiences in meaningful discourse. This versatile technology can also be applied in other social activism and marketing campaigns, making it a powerful tool for spreading awareness and promoting positive change across various causes.
13 May 2024
AutoChart: A Dataset for Chart-to-Text Generation
AutoChart represents a significant advancement in natural language generation by addressing the underexplored area of chart-to-text generation. By introducing a novel framework that automatically generates analytical descriptions for various types of charts, AutoChart enables scalable and efficient data interpretation. The dataset, consisting of over 10,000 chart-description pairs, facilitates research in both natural language processing and computer vision, offering applications in academic writing, accessibility for visually impaired users, and automated report generation. AutoChart's integration of linguistic rhetorical moves further ensures that the generated descriptions are not only informative but also coherent and contextually relevant—paving the way for innovative applications in education, journalism, and automated content generation. Preliminary research in this area is funded by Living Sky Technologies.
16 Aug 2021
AutoChart: A multimodal text generation model that recognizes charts and generates a text analysis. The above analysis is generated by AutoChart automatically.
Analyzing Antisocial Behaviors Amid COVID-19 Pandemic
In the wake of the COVID-19 pandemic, online platforms saw an alarming rise in antisocial behaviors, including hate speech and xenophobia. Our project tackles this issue head-on by developing one of the largest annotated datasets of over 40 million COVID-19-related tweets. Using a novel automated annotation framework, we analyzed toxic content targeting vulnerable communities, shedding light on new abusive lexicons that emerged during the pandemic. This research opens up pathways for developing more robust tools to monitor and curb harmful online behaviors during global crises. We also partner with the Saskatchewan Human Rights Commission in a sub-project to investigate the online xenophobia against Asian communities during the COVID-19 pandemic. This sub-project is funded by the Social Science and Humanities Research Council of Canada Partnership Engage Grants.
21 July 2020
Analysis of online xenophobic behaviors amidst the COVID-19 Pandemic in Twitter. We have collected and annotated a dataset with over 40 million COVID-19 related tweets.
User Profiling Across Multiple Online Social Platform
Online social platforms (OSPs), such as Facebook, Twitter, and Instagram, have grown monumentally over recent years. It was reported that as of August 2017, Facebook has over 2 billion monthly active users, while Instagram and Twitter have over 700 million and 300 million monthly active user accounts, respectively. The vast amount of user-generated content and social data gathered in these behemoth platforms have made them rich data sources for academic and industrial research. However, most of the existing research work has focused on analyzing and modeling user behaviors in a single platform setting, neglecting the inter-dependencies of user behaviors across multiple OSPs. In this project, we design novel techniques that enable the analysis and modeling of user behaviors in multiple OSPs. In particular, we have developed algorithms that (i) link users' profiles across multiple social platforms, (ii) analyze users' topical interests and platform preferences across multiple OSPs, and (iii) model influential users in multiple OSPs.
Linky: A visual analytical tool that extracts the results from different user identity linkage methods performed on multiple online social networks and visualizes the user profiles, content and ego networks of the linked user identities.