Securing Interactions with AI-Based Question-Answering Dialog Systems

Researchers will study truthfulness and toxicity in closed-book language model-based question-answering (QA) systems in real-world settings to characterize and mitigate vulnerabilities.

Funded by: CCI Southwest Virginia Node, in collaboration with Virginia Tech’s Institute for Society, Culture, and Environment (ISCE) and the Tech4Humanity initiative

Project Investigators

Principal Investigator (PI): Bimal Viswanath, assistant professor, Virginia Tech Department of Computer Science

Co-PI: Megan Duncan, assistant professor, Virginia Tech School of Communication

Rationale and Background

Recent advances in large language models (LMs) have revolutionized the natural language processing (NLP) space. A notable application of an LM is a dialog system, more specifically, a Question Answering (QA) system.

Traditionally, QA systems are designed as open-book models, in which information retrieval methods are used to query an external knowledge base. However, some LMs have enabled closed-book generative QA systems, in which no external references or context is required. The QA model uses knowledge internalized by the LM to answer questions.

LM-based QA systems are vulnerable to harmful biases in their training data, as they often use uncurated text from online platforms, resulting in LM training data with falsehoods and toxic messages.

Malicious actors can exploit this vulnerability, exposing users to content based on false information or toxic material that can cause emotional harm.

Methodology

There will be three primary areas of focus. The team will:

Conduct a large-scale measurement study to understand the truthfulness and toxicity of QA systems when exposed to questions humans tend to ask on the Web, using a realistic setting to test vulnerabilities.
Develop machine learning-guided approaches in QA systems by generating adversarial questions to prompt a toxic or untrue statement. This would allow practitioners to discover problematic behavior in QA systems before deployment.
Develop methods and safety datasets to detoxify and improve truthfulness in QA systems by fine-tuning approaches to reduce falsehoods and toxic responses in the text generation/decoding strategy.

Projected Outcomes

Findings will be submitted to top security venues, such as Institute of Electrical and Electronics Engineers (IEEE), S&P, USENIX Security or Conference on Computing and Communications Security (CCS).
Software tools to automatically test QA systems for vulnerabilities will be released to the community.
Safety datasets that can fine-tune QA models to improve truthfulness and mitigate toxicity will be released to the community.
The framework will be incorporated into the existing testbed and software factory with the assistance of the CCI AI Assurance team.
Software tools and findings will be demonstrated at CCI events.