Decentralized Detection of Software Supply Chain Attacks through Data Provenance Analytics

Researchers will develop an intrusion detection system (IDS) to spot sophisticated attacks on software supply chains by conducting a decentralized analysis of system logs while upholding data privacy standards.

Funded by the CCI Hub

Project Investigators

Principal Investigator (PI): Wajih Ul Hassan, University of Virginia’s Department of Computer Science

Rationale and Background

Software supply chain attacks exploit trusted software deployment processes, rendering traditional intrusion detection systems (IDS) ineffective.

A traditional IDS monitors and analyzes system logs to detect known malware signatures, but malware might camouflage itself within a trusted software update process, which can then spread harmful code. Once a software update mechanism is subverted, malware can proliferate across organizations.

In light of the SolarWinds attack, many software security vendors are delving into customer system logs to identify software behavior patterns and detect anomalies.

However, this raises concerns about data privacy and regulatory challenges. Accessing logs from multiple customers risks exposing sensitive or proprietary operational information. Regulatory complexities compound this challenge when managing extensive private datasets.

Methodology

Researchers propose a strategy that leverages the power of data provenance graphs and federated learning (FL) to analyze system logs using graph neural networks (GNNs) on provenance graphs.

FL enables decentralized training of machine learning models on distributed data without sharing raw data. This allows for cumulative learning from diverse customer hosts, preserving the privacy of each user’s data.

Through the use of GNNs, the system creates detailed node embeddings, capturing benign software behaviors and highlighting deviations as potential security threats.

These graphs allow for a deep understanding of causality and support the development of a more robust IDS. By using this graph, system defenders can reconstruct the whole attack story.

Projected Outcomes

Leveraging the power of provenance graphs and FL will allow researchers’ system to delve into logs from multiple hosts, a decentralized approach that allows for a view of software behavior across attacks.

By incorporating local differential privacy (LDP), the system will ensure that data and gradients crucial for model enhancement remain private while still tracking questionable software activity patterns.

Results will include: