A Multitask LLM-Based Vulnerability Detector with Conversational Assistance
Researchers from William & Mary, George Mason University
Researchers will develop a multitask Large Language Model (LLM)-based vulnerability detector capable of detecting, pinpointing, and explaining software vulnerability functions, as well as providing suggested solutions.
Funded by the CCI Hub
Project Investigators
- Principal Investigator (PI): Huajie Shao, William & Mary Department of Computer Science
- Co-PI: Yue Xiao, William & Mary Department of Computer Science
- Co-PI: Xiaokuan Zhang, George Mason University Computer Science
Rationale
Software vulnerabilities pose serious risks to systems, potentially leading to crashes, data loss, and security breaches.
Classical static analysis-based vulnerability-detection tools often suffer from high false positive or false negative rates, and struggle to generalize to new types of vulnerabilities.
To address this, some studies introduce deep learning methods, but they can only identify whether a code snippet is vulnerable without pinpointing vulnerable functions or providing explanations.
However, existing LLM- based methods limit their focus to specific aspects such as detecting vulnerability types or locations.
Projected Outcomes
Researchers will:
- Create a comprehensive dialogue-based vulnerability benchmark encompassing a wide range of tasks, including vulnerability type detection, vulnerability explanation, and location.
- Develop a knowledge-guided multitask LLM-based detector using instruction fine-tuning.
The LLM-based vulnerability detector will be evaluated on both the constructed benchmark dataset and software vulnerability in real-world applications, fundamentally enhancing software security.