This study investigates how large language models (LLMs) handle offensive language detection in cases where human annotators disagree, a common challenge in real-world content moderation. Using the MD-Agreement dataset, the research systematically evaluates multiple LLMs, analyzing their classification accuracy, confidence calibration, and alignment with human judgments across varying levels of annotation disagreement. Findings reveal that LLMs often exhibit overconfidence in ambiguous cases, misclassifying subjective content as offensive and failing to reflect human-like uncertainty. The study further shows that incorporating disagreement samples in few-shot learning and instruction fine-tuning significantly improves detection accuracy and model alignment with human reasoning. This work highlights the need for disagreement-aware training strategies to enhance the robustness and fairness of AI moderation systems. Explore how this research advances our understanding of LLMs as decision-makers in subjective and sensitive AI applications.
28 July 2025
CONFACT introduces a novel dataset and evaluation framework designed to assess the robustness of retrieval-augmented language models (RAG) in fact-checking scenarios where evidence from multiple sources presents conflicting information. The study highlights that existing RAG methods, even with advanced LLMs like GPT-4, struggle to discern and prioritize credible sources, leading to fact-checking failures in the presence of misinformation. To address this, the research explores strategies for integrating media source credibility into retrieval, ranking, and answer generation stages. Experiments demonstrate that incorporating source background information, especially during answer generation, significantly enhances fact-checking accuracy. CONFACT sets a new benchmark for evaluating credibility-aware fact-checking, paving the way for more reliable AI-assisted verification systems. Discover how this work advances the fight against misinformation by strengthening fact-checking with source-aware reasoning.
23 May 2025
This work introduces a comprehensive framework to evaluate how large language models (LLMs) respond to demographic persona prompts, particularly in scenarios involving social power disparities. By examining semantic shifts and response quality across nine demographic axes (e.g., race, gender, age, disability), the study reveals that LLMs exhibit a “default persona” bias favoring middle-aged, able-bodied, Caucasian, native-born males with centrist views. Additionally, the research demonstrates that power-imbalanced scenarios amplify these biases, leading to increased variability and lower-quality responses for marginalized identities. The findings underscore the importance of addressing implicit biases in LLMs, especially as they are deployed in socially sensitive applications. Explore how this study advances bias detection and mitigation strategies for persona-driven AI interactions.
22 April 2025
LongGenBench introduces the first benchmark specifically designed to evaluate large language models (LLMs) on their ability to generate high-quality, instruction-following long-form text. Unlike existing benchmarks that focus on long-context retrieval, LongGenBench challenges models to produce coherent outputs of up to 32K tokens across complex, real-world scenarios such as diary writing, skyscraper design, and urban planning. The study evaluates ten state-of-the-art LLMs, revealing that even models adept at long-context understanding struggle to maintain coherence, instruction adherence, and content diversity over extended outputs. LongGenBench provides a critical tool for advancing long-form generation research, highlighting the need for improved architectures and training methods. Explore this work to understand how LongGenBench sets a new standard for assessing LLMs in long-output generation tasks essential for applications like technical documentation and creative writing.
11 March 2025
This study proposes a novel cross-modal transfer approach to address the persistent challenge of data scarcity in hateful video detection. By leveraging widely available hateful meme datasets, the research introduces a human-assisted re-annotation pipeline that aligns meme labels with video task definitions, enabling memes to serve as both substitutes and augmentations for video datasets. Experiments demonstrate that vision-language models fine-tuned on re-annotated memes achieve comparable or superior performance to models trained directly on video data. Furthermore, combining re-annotated memes with small video datasets yields significant performance improvements, setting new benchmarks in hateful video detection. This work highlights a scalable, cost-effective solution for enhancing multimodal content moderation systems in data-constrained environments. Explore how this approach bridges modality gaps to advance the future of video-based hate speech detection.
26 January 2025
ToxiCloakCN introduces a novel dataset designed to evaluate the robustness of offensive language detection models against cloaking techniques such as homophone substitutions and emoji transformations. Targeting Chinese-language content, this work highlights the vulnerabilities of large language models (LLMs) in detecting cloaked offensive texts, particularly in cases involving racism, sexism, regional bias, and anti-LGBTQ+ rhetoric. The study reveals significant performance declines for state-of-the-art models, including GPT-4, when confronted with perturbed data. ToxiCloakCN underscores the need for advanced techniques to combat evolving evasion strategies, contributing to safer online environments and more robust content moderation systems. Explore the findings to learn how this work advances offensive language detection in complex linguistic settings.
12 November 2024
MultiHateClip introduces a novel multilingual benchmark dataset designed for hateful video detection, focusing on gender-based hate speech across English and Chinese contexts. By annotating 2,000 videos from YouTube and Bilibili for hatefulness, offensiveness, and normalcy, it provides insights into cross-cultural differences in multimodal hate speech dynamics. Leveraging state-of-the-art models like GPT-4V and mBERT ⊙ MFCC ⊙ ViViT, the study highlights the complexities in differentiating hateful from offensive content and the limitations of existing models in non-Western contexts. MultiHateClip paves the way for a more inclusive, culturally nuanced approach to online hate detection. Explore the dataset and findings to advance multimodal analysis for combating online hate speech.
28 October 2024
InstructAV introduces a groundbreaking approach to authorship verification (AV) by fine-tuning large language models with instructions and leveraging a parameter-efficient fine-tuning (PEFT) method. This framework is distinct in its dual focus on improving classification accuracy and generating clear, detailed linguistic explanations, addressing a significant gap in the AV domain. By incorporating datasets with explanatory labels and employing the LoRA fine-tuning technique, InstructAV achieves state-of-the-art performance, as demonstrated across diverse datasets like IMDB, Twitter, and Yelp Reviews. This innovative methodology not only enhances the transparency and reliability of AV systems but also paves the way for advancements in explainable AI. Explore how InstructAV is shaping the future of authorship verification.
10 July 2024
SGHateCheck is a pioneering framework designed to address the unique challenges of hate speech detection in Singapore's multilingual and culturally diverse landscape. By building on existing methodologies like HateCheck and MHC, SGHateCheck introduces functional tests for four key languages—Singlish, Mandarin, Malay, and Tamil. This project highlights the limitations of state-of-the-art models in accurately moderating content in Southeast Asian contexts, driving the development of more inclusive and effective hate speech detection systems. Explore how SGHateCheck is shaping the future of online trust and safety in low-resource language settings. This project is support by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 2.
20 June 2024
MemeCraft is an innovative project that harnesses the power of generative AI to create impactful memes for social good. Developed in SUTD, this tool uses advanced language and visual models to generate memes that support important social movements like climate action and gender equality. By ensuring that the generated content is both humorous and respectful, MemeCraft aims to engage online audiences in meaningful discourse. This versatile technology can also be applied in other social activism and marketing campaigns, making it a powerful tool for spreading awareness and promoting positive change across various causes.
13 May 2024
AutoChart: A Dataset for Chart-to-Text Generation
AutoChart represents a significant advancement in natural language generation by addressing the underexplored area of chart-to-text generation. By introducing a novel framework that automatically generates analytical descriptions for various types of charts, AutoChart enables scalable and efficient data interpretation. The dataset, consisting of over 10,000 chart-description pairs, facilitates research in both natural language processing and computer vision, offering applications in academic writing, accessibility for visually impaired users, and automated report generation. AutoChart's integration of linguistic rhetorical moves further ensures that the generated descriptions are not only informative but also coherent and contextually relevant—paving the way for innovative applications in education, journalism, and automated content generation. Preliminary research in this area is funded by Living Sky Technologies.
16 Aug 2021
AutoChart: A multimodal text generation model that recognizes charts and generates a text analysis. The above analysis is generated by AutoChart automatically.
Analyzing Antisocial Behaviors Amid COVID-19 Pandemic
In the wake of the COVID-19 pandemic, online platforms saw an alarming rise in antisocial behaviors, including hate speech and xenophobia. Our project tackles this issue head-on by developing one of the largest annotated datasets of over 40 million COVID-19-related tweets. Using a novel automated annotation framework, we analyzed toxic content targeting vulnerable communities, shedding light on new abusive lexicons that emerged during the pandemic. This research opens up pathways for developing more robust tools to monitor and curb harmful online behaviors during global crises. We also partner with the Saskatchewan Human Rights Commission in a sub-project to investigate the online xenophobia against Asian communities during the COVID-19 pandemic. This sub-project is funded by the Social Science and Humanities Research Council of Canada Partnership Engage Grants.
21 July 2020
Analysis of online xenophobic behaviors amidst the COVID-19 Pandemic in Twitter. We have collected and annotated a dataset with over 40 million COVID-19 related tweets.
Online social platforms (OSPs), such as Facebook, Twitter, and Instagram, have grown monumentally over recent years. It was reported that as of August 2017, Facebook has over 2 billion monthly active users, while Instagram and Twitter have over 700 million and 300 million monthly active user accounts, respectively. The vast amount of user-generated content and social data gathered in these behemoth platforms have made them rich data sources for academic and industrial research. However, most of the existing research work has focused on analyzing and modeling user behaviors in a single platform setting, neglecting the inter-dependencies of user behaviors across multiple OSPs. In this project, we design novel techniques that enable the analysis and modeling of user behaviors in multiple OSPs. In particular, we have developed algorithms that (i) link users' profiles across multiple social platforms, (ii) analyze users' topical interests and platform preferences across multiple OSPs, and (iii) model influential users in multiple OSPs.
Linky: A visual analytical tool that extracts the results from different user identity linkage methods performed on multiple online social networks and visualizes the user profiles, content and ego networks of the linked user identities.