Virginia Tech® home

S2M3: Split-and-Share Multi-Modal Models for Distributed Multi-Task Inference on the Edge

Research Paper Showcase 2025

Abstract

With the advancement of Artificial Intelligence (AI) towards multiple modalities (language, vision, speech, etc.), multi-modal models have increasingly been used across various applications (e.g., visual question answering or image generation/captioning). Despite the success of AI as a service for multi-modal applications, it relies heavily on clouds, which are constrained by bandwidth, latency, privacy concerns, and unavailability under network or server failures. While on-device AI becomes popular, supporting multiple tasks on edge devices imposes significant resource challenges.

To address this, we introduce S2M3, a split-and-share multi-modal architecture for multitask inference on edge devices. Inspired by the general-purpose nature of multi-modal models, which are composed of multiple modules (encoder, decoder, classifier, etc.), we propose to split multi-modal models at functional-level modules; and then share common modules to reuse them across tasks, thereby reducing resource usage.

To address cross-model dependency arising from module sharing, we propose a greedy module-level placement with per-request parallel routing by prioritizing compute-intensive modules. Through experiments on a testbed consisting of 14 multi-modal models across 5 tasks and 10benchmarks, we demonstrate thatS2M3can reduce memory usage by up to 50% and 62% in single-task and multi-task settings, respectively, without sacrificing accuracy. Furthermore, S2M3achieves optimal placement in 89 out of 95 instances (93.7%) while reducing inference latency by up to 56.9% on resource-constrained devices, compared to cloud AI.


Authors

  • JinYi Yoon, Virginia Tech
  • JiHo Lee, Virginia Tech
  • Ting He, Pennsylvania State University
  • Nakjung Choi, Nokia Bell Labs
  • Bo Ji, Virginia Tech

Publication

  • Venue: IEEE International Conference on Distributed Computing Systems (ICDCS)
  • Date: 3/26/2025

Related Papers

CTINEXUS: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models

An Exploratory Mixed-methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software

Principled and Automated Approach for Investigating AR/VR Attacks

Security Enhancement in UAV Swarms: A Case Study Using Federated Learning and SHAP Analysis

Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction

"This is not a scam!": Assessment of an awareness raising program tackling older adults' scam victimization in a multi-method study

Unraveling the Complexities of MTA-STS Deployment and Management in Securing Email