S2M3: Split-and-Share Multi-Modal Models for Distributed Multi-Task Inference on the Edge

Abstract

With the advancement of Artificial Intelligence (AI) towards multiple modalities (language, vision, speech, etc.), multi-modal models have increasingly been used across various applications (e.g., visual question answering or image generation/captioning). Despite the success of AI as a service for multi-modal applications, it relies heavily on clouds, which are constrained by bandwidth, latency, privacy concerns, and unavailability under network or server failures. While on-device AI becomes popular, supporting multiple tasks on edge devices imposes significant resource challenges.

To address this, we introduce S2M3, a split-and-share multi-modal architecture for multitask inference on edge devices. Inspired by the general-purpose nature of multi-modal models, which are composed of multiple modules (encoder, decoder, classifier, etc.), we propose to split multi-modal models at functional-level modules; and then share common modules to reuse them across tasks, thereby reducing resource usage.

To address cross-model dependency arising from module sharing, we propose a greedy module-level placement with per-request parallel routing by prioritizing compute-intensive modules. Through experiments on a testbed consisting of 14 multi-modal models across 5 tasks and 10benchmarks, we demonstrate thatS2M3can reduce memory usage by up to 50% and 62% in single-task and multi-task settings, respectively, without sacrificing accuracy. Furthermore, S2M3achieves optimal placement in 89 out of 95 instances (93.7%) while reducing inference latency by up to 56.9% on resource-constrained devices, compared to cloud AI.

Read the Paper

Authors

JinYi Yoon, Virginia Tech
JiHo Lee, Virginia Tech
Ting He, Pennsylvania State University
Nakjung Choi, Nokia Bell Labs
Bo Ji, Virginia Tech

Publication

Venue: IEEE International Conference on Distributed Computing Systems (ICDCS)
Date: 3/26/2025

S2M3: Split-and-Share Multi-Modal Models for Distributed Multi-Task Inference on the Edge

Abstract

Authors

Publication

Related Papers

CTINEXUS: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models

An Exploratory Mixed-methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software

Principled and Automated Approach for Investigating AR/VR Attacks

Security Enhancement in UAV Swarms: A Case Study Using Federated Learning and SHAP Analysis

Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction

"This is not a scam!": Assessment of an awareness raising program tackling older adults' scam victimization in a multi-method study

Unraveling the Complexities of MTA-STS Deployment and Management in Securing Email

Current Showcase

Past Showcases

2025 Papers by Topic