Virginia Tech® home

Data-Centric Social Bias Mitigation for Large Language Model-based Cyber-Harassment Detection

Researchers aim to significantly improve the inclusion and accessibility of Large Language Model (LLM) applications in cybersecurity by addressing algorithmic bias, focusing on reducing unfair treatment against vulnerable populations identified by gender, race, religion, sexual orientation, and disability. 

Funded by the CCI Hub


Project Investigators

Rationale and Background

Algorithmic bias in AI systems known as (LLMs) that monitor online behavior, specifically in detecting cyberharassment, can mistakenly identify harmless mentions of various groups — those defined by race, gender, or sexual orientation — as harassment due to ingrained biases.  

LLMs demonstrate strong capabilities in text classification and reasoning, both in zero-shot learning and scenarios with limited data (in-context learning and fine-tuning setting), making them valuable for cybersecurity tasks such as cyber-harassment detection. Yet, their application is hindered by notorious social bias.

Methodology

Researchers will follow a data-centric strategy comprising four main tasks:

  • Developing a large-scale and broad benchmark to examine social bias in detecting cyber-harassment, covering multiple types of cyber-harassment and bias dimensions.
  • Adapting the Contact Hypothesis from social psychology to augment prompts for debiasing LLMs in the zero-shot setting.
  • Selecting effective demonstration data via optimization to debias LLMs in the in-context learning setting
  • Creating an effective dataset integrating Contact Hypothesis and the Bias Benchmark for Quality Assurance (BBQ) dataset to remove bias from LLMs through fine tuning.

Projected Outcomes

Researchers will improve the data that LLMs learn from, ensuring they treat all people fairly by developing new tools and datasets that help these AIs learn to recognize bias and avoid it.

They will also  produce an open-source library consolidating developed data and methodologies, contributing to a safer and more inclusive digital environment. 

Researchers will also seek fundings from the NSF Computer Information and Science Engineering (CISE III) Core program or the NSF Secure and Trustworthy Cybersecurity SaTC program.