Virginia Tech® home

Leveraging Large Language Models for Enhanced Software Security Analysis and Malware Detection

Researchers from William & Mary, George Mason University

Researchers will create an innovative framework leveraging Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques to enhance software security analysis and malware detection for Android applications. 

By incorporating RAG techniques, the framework addresses the challenge of LLM “hallucinations” in domain-specific tasks, enhancing the reliability and accuracy of generated analyses.

Funded by CCI’s Coastal Virginia Node and Northern Virginia Node

Rationale

The proliferation of Android apps has led to an increase in potentially harmful software, making efficient and accurate security analysis crucial. 

Current methods rely heavily on human experts, which is time-consuming and limited in scope. While machine learning approaches show promise, they often lack explainability, hindering result verification.

The proposed framework aims to overcome these limitations by integrating LLMs with RAG systems to analyze Android application behavior. 

Projected Outcomes

Researchers will focus on identifying call graphs and data-flow graphs related to security queries, as well as isolating malicious code snippets from Android project source code. 

The project has the potential to revolutionize software security approaches by improving scalability, accuracy, and efficiency in Android application security analysis. 

By developing a system that can effectively extract hidden information from code and provide explainable results, this research contributes to creating more resilient and secure digital ecosystems.