Defending Against Malicious LLM-Driven Agents Utilized for Online Abuse Directed at At-Risk Communities

Researchers will create tools to counteract threats from AI-driven malicious entities on social media platforms. Raising awareness of these threats will facilitate the development of effective strategies to safeguard marginalized and vulnerable communities from exploitation.

Funded by the CCI Southwest Node

Project Investigators

Principal Investigator (PI): Bimal Viswanath, Virginia Tech Department of Computer Science
Co-PI: Yixin Sun, University of Virginia (UVA) Department of Computer Science
Co-PI: Lanfei Shi, UVA McIntire School of Commerce

Rationale and Background

Generative AI models can generate synthetic videos, images, audio, or text, which can be exploit vulnerable populations. It can also power malicious bots to behave as if they were created by real users, which could lead to hard-to-detect fake accounts on social media platforms. Such bots could enable the spread of disinformation, amplify problematic content, harass minorities, falsify abuse reporting, and manipulate opinion. A key issue is that the generative AI schemes powering Large Language Models (LLMs) can be adapted to generate deepfake behavior.

Methodology

Researchers will:

Build a software framework to characterize the threat of malicious agents that can generate deepfake behavior.
Create a behavior synthesizer tool to assess the effectiveness of existing defenses to detect malicious bots.
Leverage LLM-based methods to generate or forecast time-series data, using a partial trace of a real users’ behavior to forecast new behavior.
Focus on determining whether the behavior is synthetic or real instead of looking for malicious activity.
Leverage imperfections or artifacts in a generated behavioral trace to identify deepfake behavior.
Exploit the information asymmetry between attackers and defenders to build robust defenses.

Projected Outcomes

Researchers will contribute to the creation of novel deepfake behavior detection tools. Online service providers will be able to utilize these tools to flag and block problematic behavior by malicious bots. These mitigation strategies will protect marginalized and vulnerable groups from abuse by LLM-driven malicious accounts on social media platforms.